You are on page 1of 563

MEASURE THEORY

Volume 2

D.H.Fremlin
By the same author:
Topological Riesz Spaces and Measure Theory, Cambridge University Press, 1974.
Consequences of Martins Axiom, Cambridge University Press, 1982.

Companions to the present volume:


Measure Theory, vol. 1, Torres Fremlin, 2000.
Measure Theory, vol. 3, Torres Fremlin, 2002.

First printing May 2001

Second printing April 2003


MEASURE THEORY
Volume 2
Broad Foundations

D.H.Fremlin
Reader in Mathematics, University of Essex
Dedicated by the Author
to the Publisher

This book may be ordered from the publisher at the address below. For price and means of pay-
ment see the authors Web page http://www.essex.ac.uk/maths/staff/fremlin/mtsales.htm,
or enquire from fremdh@essex.ac.uk.

First published in 2001


by Torres Fremlin, 25 Ireton Road, Colchester CO3 3AT, England
c D.H.Fremlin 2001
The right of D.H.Fremlin to be identified as author of this work has been asserted in accordance with
the Copyright, Designs and Patents Act 1988. This work is issued under the terms of the Design Science
License as published in http://dsl.org/copyleft/dsl.txt. For the source files see http://www.essex.
ac.uk/maths/staff/fremlin/mt2.2003/index.htm.
Library of Congress classification QA312.F72
AMS 2000 classification 28A99
ISBN 0-9538129-2-8
Typeset by AMS-TEX
Printed in England by Biddles Short Run Books, Kings Lynn
5

Contents

General Introduction 9
Introduction to Volume 2 10

*Chapter 21: Taxonomy of measure spaces


Introduction 12
211 Definitions 12
Complete, totally finite, -finite, strictly localizable, semi-finite, localizable, locally determined measure
spaces; atoms; elementary relationships; countable-cocountable measures.
212 Complete spaces 18
Measurable and integrable functions on complete spaces; completion of a measure.
213 Semi-finite, locally determined and localizable spaces 24
Integration on semi-finite spaces; c.l.d. versions; measurable envelopes; characterizing localizability, strict
localizability, -finiteness.
214 Subspaces 36
Subspace measures on arbitrary subsets; integration; direct sums of measure spaces.
215 -finite spaces and the principle of exhaustion 43
The principle of exhaustion; characterizations of -finiteness; the intermediate value theorem for atomless
measures.
*216 Examples 47
A complete localizable non-locally-determined space; a complete locally determined non-localizable
space; a complete locally determined localizable space which is not strictly localizable.

Chapter 22: The fundamental theorem of calculus


Introduction 54
221 Vitalis theorem in R 54
Vitalis theorem for intervals in R.
222 Differentiating an indefinite integral Rx
57
d
Monotonic functions are differentiable a.e., and their derivatives are integrable; dx a f = f a.e.
223 Lebesgues density theorems 63
1
R x+h 1
R x+h
f (x) = limh0 h x f a.e. (x); density points; limh0 2h xh |f f (x)| = 0 a.e. (x); the Lebesgue set
of a function.
224 Functions of bounded variation 68
Variation of a function; differences of monotonic functions;
R sums and products, limits, continuity and
differentiability for b.v. functions; an inequality for f g.
225 Absolutely continuous functions 77
Absolute continuity of indefinite integrals; absolutely continuous functions on R; integration by parts;
lower semi-continuous functions; *direct images of negligible sets; the Cantor function.
*226 The Lebesgue decomposition of a function of bounded variation 87
Sums over arbitrary index sets; saltus functions; the Lebesgue decomposition.

Chapter 23: The Radon-Nikodym theorem


Introduction 95
231 Countably additive functionals 95
Additive and countably additive functionals; Jordan and Hahn decompositions.
232 The Radon-Nikodym theorem 100
Absolutely and truly continuous additive functionals; truly continuous functionals are indefinite integrals;
*the Lebesgue decomposition of a countably additive functional.
233 Conditional expectations 109
-subalgebras; conditional expectations of integrable functions; convex functions; Jensens inequality.
234 Indefinite-integral measures 117
Measures f and their basic properties.
235 MeasurableR transformations
R
121
The formula g(y)(dy) = J(x)g((x))(dx); detailed conditions of applicability; inverse-measure-
preserving functions; the image measure catastrophe; using the Radon-Nikodym theorem.
6

Chapter 24: Function spaces


Introduction 133
241 L0 and L0 133
The linear, order and multiplicative structure of L0 ; action of Borel functions; Dedekind completeness
and localizability.

242 L1 141
The normed lattice L1 ; integration as a linear functional; completeness and Dedekind completeness; the
Radon-Nikodym theorem and conditional expectations; convex functions; dense subspaces.
243 L 152
The normed lattice L ; completeness; the duality between L1 and L ; localizability, Dedekind com-
pleteness and the identification L
= (L1 ) .
244 Lp 160
The normed lattices Lp , for 1 < p < ; Holders inequality; completeness and Dedekind completeness;
(Lp )
= Lq ; conditional expectations.
245 Convergence in measure 173
The topology of (local) convergence in measure on L0 ; pointwise convergence; localizability and Dede-
kind completeness; embedding Lp in L0 ; k k1 -convergence and convergence in measure; -finite spaces,
metrizability and sequential convergence.
246 Uniform integrability 184
Uniformly integrable sets in L1 and L1 ; elementary properties; disjoint-sequence characterizations; k k1
and convergence in measure on uniformly integrable sets.

247 Weak compactness in L1 193


A subset of L1 is uniformly integrable iff it is relatively weakly compact.

Chapter 25: Product measures


Introduction 199
251 Finite products 199
Primitive and c.l.d. products; basic properties; Lebesgue measure on R r+s as a product measure; prod-
ucts of direct sums and subspaces; c.l.d. versions.
252 Fubinis
RR
theorem R
215
When f (x, y)dxdy and f (x, y)d(x, y) are equal; measures of ordinate sets; *the volume of a ball in
Rr .
253 Tensor products 233
L1 () as a completion of L1 ()L1 (); bounded bilinear maps; the ordering of L1 (); conditional
expectations.
254 Infinite products 244
Products of arbitrary families of probability spaces; basic properties; inverse-measure-preserving func-
tions; usual measure on {0, 1}I ; {0, 1}N isomorphic, as measure space, to [0, 1]; subspaces of full outer
measure; sets determined by coordinates in a subset of the index set; generalized associative law for prod-
ucts of measures; subproducts as image measures; factoring functions through subproducts; conditional
expectations on subalgebras corresponding to subproducts.
255 Convolutions of functions R R
263
Shifts in R 2 as measure space automorphisms; convolutions of functions on R; h (f g) = h(x +
y)f (x)g(y)d(x, y); f (g h) = (f g) h; kf gk1 kf k1 kgk1 ; the groups R r and ], ].
256 Radon measures on R r 275
Definition of Radon measures on R r ; completions of Borel measures; Lusin measurability; image mea-
sures; products of two Radon measures; semi-continuous functions.
257 Convolutions of measures R RR
284
Convolution of totally finite Radon measures on R r ; h d(1 2 ) = h(x+y)1 (dx)2 (dy); 1 (2 3 ) =
(1 2 ) 3 .
7

Chapter 26: Change of variable in the integral


Introduction 287
261 Vitalis theorem in R r 287
Vitalis theorem for balls in R r ; Lebesgues Density Theorem.
262 Lipschitz and differentiable functions 295
Lipschitz functions; elementary properties; differentiable functions from R r to R s ; differentiability and
partial derivatives; approximating a differentiable function by piecewise affine functions; *Rademachers
theorem.
263 DifferentiableR transformations
R
in R r 308
In the formula g(y)dy = J(x)g((x))dx, find J when is (i) linear (ii) differentiable; detailed condi-
tions of applicability; polar coordinates; the one-dimensional case.
264 Hausdorff measures 319
r-dimensional Hausdorff measure on R s ; Borel sets are measurable; Lipschitz functions; if s = r, we have
a multiple of Lebesgue measure; *Cantor measure as a Hausdorff measure.
265 Surface measures 329
Normalized Hausdorff measure; action of linear operators and differentiable functions; surface measure
on a sphere.

Chapter 27: Probability theory


Introduction 338
271 Distributions 339
Terminology; distributions as Radon measures; distribution functions; densities; transformations of ran-
dom variables.
272 Independence 346
Independent families of random variables; characterizations of independence; joint distributions of (finite)
independent families, and product measures; the zero-one law; E(XY ), Var(X + Y ); distribution of a
sum as convolution of distributions; *Etemadis inequality.
273 The strong law of large numbers 357
1 Pn P 1
n+1
Xi 0 a.e. if the Xn are independent with zero expectation and either (i)
i=0 n=0 (n+1)2
P 1+
Var(Xn ) < or (ii) n=0 E(|Xn | ) < for some > 0 or (iii) the Xn are identically distributed.
274 The Central Limit Theorem 370
Normally distributed r.v.s; Lindebergs conditions for the Central Limit Theorem; corollaries; estimating
R x2 /2
e dx.
275 Martingales 381
Sequences of -algebras, and martingales adapted to them; up-crossings; Doobs Martingale Convergence
Theorem; uniform integrability, k k1 -convergence and martingales as sequences of conditional expecta-
tions; reverse martingales; stopping times.
276 Martingale difference sequences 392
Martingale difference sequences; strong law of large numbers for m.d.ss.; Komlos theorem.

Chapter 28: Fourier analysis


Introduction 401
281 The Stone-Weierstrass theorem 401
Approximating a function on a compact set by members of a given lattice or algebra of functions; real
and complex cases; approximation by polynomials and trigonometric functions; Weyls Equidistribution
Theorem in [0, 1]r .
282 Fourier series 413
Fourier and Fejer sums; Dirichlet and Fejer kernels; Riemann-Lebesgue lemma; uniform convergence
of Fejer sums of a continuous function; a.e. convergence of Fejer sums of an integrable function; k k2 -
convergence of Fourier sums of a square-integrable function; convergence of Fourier sums of a differen-
tiable or b.v. function; convolutions and Fourier coefficients.
283 Fourier Transforms I 432
R
Fourier and inverse Fourier transforms; elementary properties; 0 x1 sin x dx = 1
2
; the formula f =f
2 R R
for differentiable and b.v. f ; convolutions; ex /2 ; f g = f g.
8

284 Fourier Transforms II 448



Test functions; h = h; tempered functions; tempered functions which represent each others transforms;
convolutions; square-integrable functions; Diracs delta function.
285 Characteristic functions 465
The characteristic function of a distribution; independent r.v.s; the normal distribution; the vague
topology on the space of distributions, and sequential convergence of characteristic functions; Poissons
theorem.
286 Carlesons theorem 480
The Hardy-Littlewood Maximal Theorem; the Lacey-Thiele proof of Carlesons theorem.

Appendix to Volume 2
Introduction 513
2A1 Set theory 513
Ordered sets; transfinite recursion; ordinals; initial ordinals; Schroder-Bernstein theorem; filters; Axiom
of Choice; Zermelos Well-Ordering Theorem; Zorns Lemma; ultrafilters.
2A2 The topology of Euclidean space 519
Closures; compact sets; open sets in R.
2A3 General topology 523
Topologies; continuous functions; subspace topologies; Hausdorff topologies; pseudometrics; convergence
of sequences; compact spaces; cluster points of sequences; convergence of filters; product topologies;
dense subsets.
2A4 Normed spaces 532
Normed spaces; linear subspaces; Banach spaces; bounded linear operators; dual spaces; extending a
linear operator from a dense subspace; normed algebras.
2A5 Linear topological spaces 535
Linear topological spaces; topologies defined by functionals; completeness; weak topologies.
2A6 Factorization of matrices 539
Determinants; orthonormal families; T = P DQ where D is diagonal and P , Q are orthogonal.

Concordance 541
References for Volume 2 542
Index to Volumes 1 and 2
Principal topics and results 544
General index 548
General introduction 9

General introduction In this treatise I aim to give a comprehensive description of modern abstract measure
theory, with some indication of its principal applications. The first two volumes are set at an introductory
level; they are intended for students with a solid grounding in the concepts of real analysis, but possibly with
rather limited detailed knowledge. As the book proceeds, the level of sophistication and expertise demanded
will increase; thus for the volume on topological measure spaces, familiarity with general topology will be
assumed. The emphasis throughout is on the mathematical ideas involved, which in this subject are mostly
to be found in the details of the proofs.
My intention is that the book should be usable both as a first introduction to the subject and as a reference
work. For the sake of the first aim, I try to limit the ideas of the early volumes to those which are really
essential to the development of the basic theorems. For the sake of the second aim, I try to express these ideas
in their full natural generality, and in particular I take care to avoid suggesting any unnecessary restrictions
in their applicability. Of course these principles are to to some extent contradictory. Nevertheless, I find that
most of the time they are very nearly reconcilable, provided that I indulge in a certain degree of repetition.
For instance, right at the beginning, the puzzle arises: should one develop Lebesgue measure first on the
real line, and then in spaces of higher dimension, or should one go straight to the multidimensional case? I
believe that there is no single correct answer to this question. Most students will find the one-dimensional
case easier, and it therefore seems more appropriate for a first introduction, since even in that case the
technical problems can be daunting. But certainly every student of measure theory must at a fairly early
stage come to terms with Lebesgue area and volume as well as length; and with the correct formulations, the
multidimensional case differs from the one-dimensional case only in a definition and a (substantial) lemma.
So what I have done is to write them both out (114-115). In the same spirit, I have been uninhibited,
when setting out exercises, by the fact that many of the results I invite students to look for will appear in
later chapters; I believe that throughout mathematics one has a better chance of understanding a theorem
if one has previously attempted something similar alone.
As I write this Introduction (March 2003), the plan of the work is as follows:
Volume 1: The Irreducible Minimum
Volume 2: Broad Foundations
Volume 3: Measure Algebras
Volume 4: Topological Measure Spaces
Volume 5: Set-theoretic Measure Theory.
Volume 1 is intended for those with no prior knowledge of measure theory, but competent in the elementary
techniques of real analysis. I hope that it will be found useful by undergraduates meeting Lebesgue measure
for the first time. Volume 2 aims to lay out some of the fundamental results of pure measure theory
(the Radon-Nikodym theorem, Fubinis theorem), but also gives short introductions to some of the most
important applications of measure theory (probability theory, Fourier analysis). While I should like to
believe that most of it is written at a level accessible to anyone who has mastered the contents of Volume 1,
I should not myself have the courage to try to cover it in an undergraduate course, though I would certainly
attempt to include some parts of it. Volumes 3 and 4 are set at a rather higher level, suitable to postgraduate
courses; while Volume 5 will assume a wide-ranging competence over large parts of analysis and set theory.
There is a disclaimer which I ought to make in a place where you might see it in time to avoid paying for
this book. I make no attempt to describe the history of the subject. This is not because I think the history
uninteresting or unimportant; rather, it is because I have no confidence of saying anything which would not
be seriously misleading. Indeed I have very little confidence in anything I have ever read concerning the
history of ideas. So while I am happy to honour the names of Lebesgue and Kolmogorov and Maharam in
more or less appropriate places, and I try to include in the bibliographies the works which I have myself
consulted, I leave any consideration of the details to those bolder and better qualified than myself.
The work as a whole is not yet complete; and when it is finished, it will undoubtedly be too long
to be printed as a single volume in any reasonable format. I am therefore publishing it one part at a
time. However, drafts of most of the rest are available on the Internet; see http://www.essex.ac.uk/
maths/staff/fremlin/mt.htm for detailed instructions. For the time being, at least, printing will be in
short runs. I hope that readers will be energetic in commenting on errors and omissions, since it should be
possible to correct these relatively promptly. An inevitable consequence of this is that paragraph references
may go out of date rather quickly. I shall be most flattered if anyone chooses to rely on this book as a source
10 General introduction

for basic material; and I am willing to attempt to maintain a concordance to such references, indicating
where migratory results have come to rest for the moment, if authors will supply me with copies of papers
which use them. Two such items can already be found in the concordance to the present volume.
I mention some minor points concerning the layout of the material. Most sections conclude with lists of
basic exercises and further exercises, which I hope will be generally instructive and occasionally enter-
taining. How many of these you should attempt must be for you and your teacher, if any, to decide, as no
two students will have quite the same needs. I mark with a > those which seem to me to be particularly
important. But while you may not need to write out solutions to all the basic exercises, if you are in any
doubt as to your capacity to do so you should take this as a warning to slow down a bit. The further
exercises are unbounded in difficulty, and are unified only by a presumption that each has at least one
solution based on ideas already introduced. Occasionally I add a final problem, a question to which I do
not know the answer and which seems to arise naturally in the course of the work.
The impulse to write this book is in large part a desire to present a unified account of the subject.
Cross-references are correspondingly abundant and wide-ranging. In order to be able to refer freely across
the whole text, I have chosen a reference system which gives the same code name to a paragraph wherever
it is being called from. Thus 132E is the fifth paragraph in the second section of the third chapter of
Volume 1, and is referred to by that name throughout. Let me emphasize that cross-references are supposed
to help the reader, not distract her. Do not take the interpolation (121A) as an instruction, or even a
recommendation, to lift Volume 1 off the shelf and hunt for 121. If you are happy with an argument as it
stands, independently of the reference, then carry on. If, however, I seem to have made rather a large jump,
or the notation has suddenly become opaque, local cross-references may help you to fill in the gaps.
Each volume will have an appendix of useful facts, in which I set out material which is called on
somewhere in that volume, and which I do not feel I can take for granted. Typically the arrangement of
material in these appendices is directed very narrowly at the particular applications I have in mind, and is
unlikely to be a satisfactory substitute for conventional treatments of the topics touched on. Moreover, the
ideas may well be needed only on rare and isolated occasions. So as a rule I recommend you to ignore the
appendices until you have some direct reason to suppose that a fragment may be useful to you.
During the extended gestation of this project I have been helped by many people, and I hope that my
friends and colleagues will be pleased when they recognise their ideas scattered through the pages below.
But I am especially grateful to those who have taken the trouble to read through earlier drafts and comment
on obscurities and errors.

Introduction to Volume 2
For this second volume I have chosen seven topics through which to explore the insights and challenges
offered by measure theory. Some, like the Radon-Nikodym theorem (Chapter 23) are necessary for any
understanding of the structure of the subject; others, like Fourier analysis (Chapter 28) and the discussion
of function spaces (Chapter 24) demonstrate the power of measure theory to attack problems in general
real and functional analysis. But all have applications outside measure theory, and all have influenced its
development. These are the parts of measure theory which any analyst may find himself using.
Every topic is one which ideally one would wish undergraduates to have seen, but the length of this
volume makes it plain that no ordinary undergraduate course could include very much of it. It is directed
rather at graduate level, where I hope it will be found adequate to support all but the most ambitious
courses in measure theory, though it is perhaps a bit too solid to be suitable for direct use as a course text,
except with careful selection of the parts to be covered. If you are using it to teach yourself measure theory,
I strongly recommend an eclectic approach, looking for particular subjects and theorems that seem startling
or useful, and working backwards from them. My other objective, of course, is to provide an account of the
central ideas at this level in measure theory, rather fuller than can easily be found in one volume elsewhere.
I cannot claim that it is definitive, but I do think I cover a good deal of ground in ways that provide
a firm foundation for further study. As in Volume 1, I usually do not shrink from giving best results,
like Lindebergs conditions for the Central Limit Theorem (274), or the theory of products of arbitrary
measure spaces (251). If I were teaching this material to students in a PhD programme I would rather
accept a limitation in the breadth of the course than leave them unaware of what could be done in the areas
discussed.
Introduction to Volume 2 11

The topics interact in complex ways one of the purposes of this book is to exhibit their relationships.
There is no canonical linear ordering in which they should be taken. Nor do I think organization charts are
very helpful, not least because it may be only two or three paragraphs in a section which are needed for a
given chapter later on. I do at least try to lay the material of each section out in an order which makes
initial segments useful by themselves. But the order in which to take the chapters is to a considerable extent
for you to choose, perhaps after a glance at their individual introductions. I have done my best to pitch the
exposition at much the same level throughout the volume, sometimes allowing gradients to steepen in the
course of a chapter or a section, but always trying to return to something which anyone who has mastered
Volume 1 ought to be able to cope with. (Though perhaps the main theorems of Chapter 26 are harder
work than the principal results elsewhere, and 286 is only for the most determined.)
I said there were seven topics, and you will see eight chapters ahead of you. This is because Chapter 21
is rather different from the rest. It is the purest of pure measure theory, and is here only because there are
places later in the volume where (in my view) the theorems make sense only in the light of some abstract
concepts which are not particularly difficult, but are also not obvious. However it is fair to say that the
most important ideas of this volume do not really depend on the work of Chapter 21.
As always, it is a puzzle to know how much prior knowledge to assume in this volume. I do of course call
on the results of Volume 1 of this treatise whenever they seem to be relevant. I do not doubt, however, that
there will be readers who have learnt the elementary theory from other sources. Provided you can, from first
principles, construct Lebesgue measure and prove the basic convergence theorems for integrals on arbitrary
measure spaces, you ought to be able to embark on the present volume. Perhaps it would be helpful to have
in hand the results-only version of Volume 1, since that includes the most important definitions as well as
statements of the theorems.
There is also the question of how much material from outside measure theory is needed. Chapter 21
calls for some non-trivial set theory (given in 2A1), but the more advanced ideas are needed only for the
counter-examples in 216, and can be passed over to begin with. The problems become acute in Chapter
24. Here we need a variety of results from functional analysis, some of them depending on non-trivial ideas
in general topology. For a full understanding of this material there is no substitute for a course in normed
spaces up to and including a study of weak compactness. But I do not like to insist on such a preparation,
because it is likely to be simultaneously too much and too little. Too much, because I hardly mention linear
operators at this stage; too little, because I do ask for some of the theory of non-locally-convex spaces,
which is often omitted in first courses on functional analysis. At the risk, therefore, of wasting paper, I have
written out condensed accounts of the essential facts (2A3-2A5).

Note on second printing


For the second printing of this volume, I have made two substantial corrections to inadequate proofs and
a large number of minor amendments; I am most grateful to T.D.Austin for his careful reading of the first
printing. In addition, I have added a dozen exercises and a handful of straightforward results which turn
out to be relevant to the work of later volumes and fit naturally here.
The regular process of revision of this work has led me to make a couple of notational innovations not
described explicitly in the early editions of Volume 1. I trust that most readers will find these immediately
comprehensible. If, however, you find that there is a puzzling cross-reference which you are unable to match
with anything in the version of Volume 1 which you are using, it may be worth while checking the errata
pages in http://www.essex.ac.uk/maths/staff/fremlin/mterr.htm.
12 Taxonomy of measure spaces

*Chapter 21
Taxonomy of measure spaces
I begin this volume with a starred chapter. The point is that I do not really recommend this chapter
for beginners. It deals with a variety of technical questions which are of great importance for the later
development of the subject, but are likely to be both abstract and obscure for anyone who has not encoun-
tered the problems these techniques are designed to solve. On the other hand, if (as is customary) this
work is omitted, and the ideas are introduced only when urgently needed, the student is likely to finish with
very vague ideas on which theorems can be expected to apply to which types of measure space, and with
no vocabulary in which to express those ideas. I therefore take a few pages to introduce the terminology
and concepts which can be used to distinguish good measure spaces from others, with a few of the basic
relationships. The only paragraphs which are immediately relevant to the theory set out in Volume 1 are
those on complete, -finite and semi-finite measure spaces (211A, 211D, 211F, 211Lc, 212, 213A-213B,
215B), and on Lebesgue measure (211M). For the rest, I think that a newcomer to the subject can very
well pass over this chapter for the time being, and return to it for particular items when the text of later
chapters refers to it. On the other hand, it can also be used as an introduction to the flavour of the purest
kind of measure theory, the study of measure spaces for their own sake, with a systematic discussion of a
few of the elementary constructions.

211 Definitions
I start with a list of definitions, corresponding to the concepts which I have found to be of value in
distinguishing different types of measure space. Necessarily, the significance of many of these ideas is likely
to be obscure until you have encountered some of the obstacles which arise later on. Nevertheless, you
will I hope be able to deal with these definitions on a formal, abstract basis, and to follow the elementary
arguments involved in establishing the relationships between them (211L).
In 216 I give three substantial examples to demonstrate the rich variety of objects which the definition of
measure space encompasses. In the present section, therefore, I content myself with very brief descriptions
of sufficient cases to show at least that each of the definitions here discriminates between different spaces
(211M-211R).

211A Definition Let (X, , ) be a measure space. Then , or (X, , ), is (Caratheodory) complete
if whenever A E and E = 0 then A ; that is, if every negligible subset of X is measurable.

211B Definition Let (X, , ) be a measure space. Then (X, , ), is a probability space if X = 1.
In this case is called a probability or probability measure.

211C Definition Let (X, , ) be a measure space. Then , or (X, , ), is totally finite if X < .

211D Definition Let (X, , ) be a measure space. Then , or S (X, , ), is -finite if there is a
sequence hEn inN of measurable sets of finite measure such that X = nN En .
Remark Note that in this case we can set
S S
Fn = En \ i<n Ei , Gn = in Ei
for each n, to obtain a disjoint cover hFn inN of X by measurable sets of finite measure, and a non-decreasing
sequence hGn inN of sets of finite measure covering X.

211E Definition Let (X, , ) be a measure space. Then , or (X, , ), is strictly localizable or
decomposable
S if there is a disjoint family hXi iiI of measurable sets of finite measure such that X =
iI Xi and
= {E : E X, E Xi i I},
211L Definitions 13

P
E = iI (E Xi ) for every E .
I will call such a family hXi iiI a decomposition of X.
P
Remark In this context, we can interpret the sum iI (E Xi ) simply as
P
sup{ iJ (E Xi ) : J is a finite subset of I},
P
taking i (E Xi ) = 0, because we are concerned only with sums of non-negative terms (cf. 112Bd).

211F Definition Let (X, , ) be a measure space. Then , or (X, , ), is semi-finite if whenever
E and E = there is an F E such that F and 0 < F < .

211G Definition Let (X, , ) be a measure space. Then , or (X, , ), is localizable or Maharam
if it is semi-finite and, for every E , there is an H such that (i) E \ H is negligible for every E E
(ii) if G and E \ G is negligible for every E E, then H \ G is negligible. It will be convenient to call
such a set H an essential supremum of E in .
Remark The definition here is clumsy, because really the concept applies to measure algebras rather than to
measure spaces (see 211Yb-211Yc). However, the present definition can be made to work (see 213N, 241G,
243G below) and enables us to proceed without a formal introduction to the concept of measure algebra
before the time comes to do the job properly in Volume 3.

211H Definition Let (X, , ) be a measure space. Then , or (X, , ), is locally determined if it
is semi-finite and
= {E : E X, E F whenever F and F < };
that is to say, for any E PX \ there is an F such that F < and E F
/ .

211I Definition Let (X, , ) be a measure space. A set E is an atom for if E > 0 and
whenever F , F E one of F , E \ F is negligible.

211J Definition Let (X, , ) be a measure space. Then , or (X, , ), is atomless or diffused if
there are no atoms for . (Note that this is not the same thing as saying that all finite sets are negligible;
see 211R below.)

211K Definition Let (X, , ) be a measure space. Then , or (X, , ), is purely atomic if whenever
E and E is not negligible there is an atom for included in E.
Remark
P Recall that a measure on a set X is point-supported if measures every subset of X and
E = xE {x} for every E X (112Bd). Every point-supported measure is purely atomic, because {x}
must be an atom whenever {x} > 0, but not every purely atomic measure is point-supported (211R).

211L The relationships between the concepts above are in a sense very straightforward; all the direct
implications in which one property implies another are given in the next theorem.
Theorem (a) A probability space is totally finite.
(b) A totally finite measure space is -finite.
(c) A -finite measure space is strictly localizable.
(d) A strictly localizable measure space is localizable and locally determined.
(e) A localizable measure space is semi-finite.
(f) A locally determined measure space is semi-finite.
proof (a), (b), (e) and (f) are trivial.
(c) Let (X, , ) be a -finite measure space; let hFn inN be a disjoint sequence of measurable sets of
finite measure covering X (see the remark in 211D). If E , then of course E Fn for every n N,
and
14 Taxonomy of measure spaces 211L

P P
E = n=0 (E Fn ) = nN (E Fn ).
If E X and E Fn for every n N, then
S
E = nN E Fn .
So hFn inN is a decomposition of X and (X, , ) is strictly localizable.
(d) Let (X, , ) be a strictly localizable measure space; let hXi iiI be a decomposition of X.
(i) Let E be a family of measurable subsets of X. Let F be the family of measurable sets F X
such
S that (F E) = 0 for every E E. Note that F and, if hFn inN is any sequence in F, then
nN Fn F. For each i I, set i = sup{(F Xi ) : F F} and choose a sequence hFin inN in F such
that limn (Fin Xi ) = i ; set
S
Fi = nN Fin F .
Set
S
F = iI Fi Xi X
and H = X \ F .
We see that F Xi = Fi Xi for each i I (because hXi iiI is disjoint), so F and H . For any
E E,
P P
(E \ H) = (E F ) = iI (E F Xi ) = iI (E Fi Xi ) = 0
because every Fi belongs to F. Thus F F. If G and (E \ G) = 0 for every E E, then X \ G and
F 0 = F (X\G) belong to F. So (F 0 Xi ) i for each i I. But also (F Xi ) supnN (Fin Xi ) = i ,
so (F Xi ) = (F 0 Xi ) for each i. Because Xi is finite, it follows that ((F 0 \ F ) Xi ) = 0, for each i.
Summing over i, (F 0 \ F ) = 0, that is, (H \ G) = 0.
Thus H is an essential supremum for E in . As E is arbitrary, (X, , ) is localizable.
(ii) If E and E = , then there is some i I such that
0 < (E Xi ) Xi < ;
so (X, , ) is semi-finite. If E X and E F whenever F < , then, in particular, E Xi for
every i I, so E ; thus (X, , ) is locally determined.

211M Example: Lebesgue measure Let us consider Lebesgue measure in the light of the concepts
above. Write for Lebesgue measure on R r and for its domain.

(a) is complete, because it is constructed by Caratheodorys method; if A E and E = 0, then


A = E = 0 (writing for Lebesgue outer measure), so, for any B R,
(B A) + (B \ A) 0 + B = B,
and A must be measurable.
S
(b) is -finite, because R = nN [n, n], writing n for the vector (n, . . . , n), and [n, n] = (2n)r <
for every n. Of course is neither totally finite nor a probability measure.

(c) Because is -finite, it is strictly localizable (211Lc), localizable (211Ld), locally determined (211Ld)
and semi-finite (211Le-f).

(d) is atomless. P
P Suppose that E . Consider the function
a 7 f (a) = (E [a, a]) : [0, [ R
We have
f (a) f (b) f (a) + [b, b] [a, a] = f (a) + (2b)r (2a)r
whenever a b in [0, [, so f is continuous. Now f (0) = 0 and limn f (n) = E > 0. By the Intermediate
Value Theorem there is an a [0, [ such that 0 < f (a) < E. So we have
211Q Definitions 15

0 < (E [a, a]) < E.


As E is arbitrary, is atomless. Q
Q

(e) It is now a trivial observation that cannot be purely atomic, because R r itself is a set of positive
measure not including any atom.

211N Counting measure Take X to be any uncountable set (e.g., X = R), and to be counting
measure on X (112Bd).

(a) is complete, because if A E and E = 0 then


A = E = .

(b) is not -finite, because


S if hEn inN is any sequence of sets of finite measure then every En is finite,
therefore countable, and nN En is countable (1A1F), so cannot be X. A fortiori, is not a probability
measure nor totally finite.

(c) is strictly localizable. P


P Set Xx = {x} for every x X. Then hXx ixX is a partition of X, and for
any E X
(E Xx ) = 1 if x E, 0 otherwise.
By the definition of ,
P
E = xX (E Xx )
for every E X, and hXx ixX is a decomposition of X. Q Q
Consequently is localizable, locally determined and semi-finite.

(d) is purely atomic. P


P {x} is an atom for every x X, and if E > 0 then surely E includes {x} for
some x. Q
Q Obviously, is not atomless.

211O A non-semi-finite space Set X = {0}, = {, X}, = 0 and X = . Then is not semi-
finite, as X = but X has no subset of non-zero finite measure. It follows that cannot be localizable,
locally determined, -finite, totally finite nor a probability measure. Because = PX, is complete. X is
an atom for , so is purely atomic (indeed, it is point-supported).

211P A non-complete space Write B for the -algebra of Borel subsets of R (111G), and for the
restriction of Lebesgue measure to B (recall that by 114G every Borel subset of R is Lebesgue measurable).
Then (R, B, ) is atomless, -finite and not complete.
proof (a) To see that is not complete, recall that there is a continuous, strictly increasing bijection
g : [0, 1] [0, 1] such that g[C] > 0, where C is the Cantor set, so that there is a set A g[C] which
is not Lebesgue measurable (134Ib). Now g 1 [A] C cannot be a Borel set, since A = (g 1 [A])g 1
is not Lebesgue measurable, therefore not Borel measurable, and the composition of two Borel measurable
functions is Borel measurable (121Eg); so g 1 [A] is a non-measurable subset of the negligible set C.
(b) The rest of the arguments of 211M apply to just as well as to true Lebesgue measure, so is -finite
and atomless.
*Remark The argument offered in (a) could give rise to a seriously false impression. The set A referred
to there can be constructed only with the use of a strong form of the axiom of choice. No such device is
necessary for the result here. There are many methods of constructing non-Borel subsets of the Cantor set,
all illuminating in different ways, and some do not need the axiom of choice at all; I hope to return to this
question in Volumes 4 and 5.

211Q Some probability spaces Two obvious constructions of probability spaces, restricting myself
to the methods described in Volume 1, are
(a) the subspace measure induced by Lebesgue measure on [0, 1] (131B);
16 Taxonomy of measure spaces 211Q

P
(b) the point-supported
P measure induced on a set X by a function h : X [0, 1] such that xX h(x) = 1,
writing E = xE h(x) for every E X; for instance, if X is a singleton {x} and h(x) = 1, or if X = N
and h(n) = 2n1 .
Of these two, (a) gives an atomless probability measure and (b) gives a purely atomic probability measure.

211R The countable-cocountable measure The following is one of the basic constructions to keep
in mind when considering abstract measure spaces.

(a) Let X be any set. Let be the family of those sets E X such that either E or X \ E is countable.
Then is a -algebra of subsets of X. PP (i) is countable, so belongs to . (ii) The condition for E
to belong to is symmetric between
S E and X \ E, so X \ E for every E . (iii) Let hEn inN be
any sequence in , and set E = nN En . If every En is countable, then E is countable, so belongs to .
Otherwise, there is some n such that X \ En is countable, in which case X \ E X \ En is countable, so
again E . QQ is called the countable-cocountable -algebra of X.

(b) Now consider the function : {0, 1} defined by writing E = 0 if E is countable, E = 1 if


E and E is not countable. Then is a measure. P P (i) is countable so = 0. (ii) Let hEn inN be a
disjoint sequence in , and E its union. () If every Em is countable, then so is E, so
P
E = 0 = n=0 En .
() If some Em is uncountable, then E Em is also uncountable, and E = Em = 1. But in this case,
because Em , X \ Em is countable, so En , being a subset of X \ Em , is countable for every n =
6 m; thus
En = 0 for every n 6= m, and
P
E = 1 = n=0 En .
As hEn inN is arbitrary, is a measure. Q
Q ( is called the countable-cocountable measure on X.)

(c) If X is any uncountable set and is the countable-cocountable measure on X, then is a complete,
purely atomic probability measure, but is not point-supported. P P (i) If A E and E = 0, then E is
countable, so A is also countable and belongs to . Thus is complete. (ii) Because X is uncountable,
X = 1 and is a probability measure. (iii) If E > 0, then F = E = 1 whenever F is aPnon-negligible
measurable subset of E, so E is itself an atom; thus is purely atomic. (iv) X = 1 > 0 = xX {x}, so
is not point-supported. Q
Q

211X Basic exercises > (a) Let be counting measure on a set X. Show that is always strictly
localizable and purely atomic, and that it is -finite iff X is countable, totally finite iff X is finite, a
probability measure iff X is a singleton, and atomless iff X is empty.

> (b) Let g : R R be a non-decreasing function and g the associated Lebesgue-Stieltjes measure
(114Xa). Show that g is complete and -finite. Show that
(i) g is totally finite iff g is bounded;
(ii) g is a probability measure iff limx g(x) limx g(x) = 1;
(iii) g is atomless iff g is continuous;
(iv) if E is any atom for g , there is a point x E such that g E = g {x};
(v) g is purely atomic iff it is point-supported.

(c) Let X be a set. Show that for any -ideal I of subsets of X (definition: 112Db), the set
= I {X \ A : A I}
is a -algebra of subsets of X, and that there is a measure : {0, 1} given by setting
E = 0 if E I, E = 1 if E \ I.
Show that I is precisely the ideal of -negligible sets, that is complete, totally finite and purely atomic,
and is a probability measure iff X
/ I.
211 Notes Definitions 17

> (d) Let (X, , ) be a measure space, Y any set and : X Y a function. Let 1 be the image
measure as defined in 112E. Show that
(i) 1 is complete whenever is;
(ii) 1 is a probability measure iff is;
(iii) 1 is totally finite iff is;
(iv) is -finite if 1 is;
(v) if is purely atomic and -finite, then 1 is purely atomic;
(vi) if is purely atomic and 1 is semi-finite, then 1 is purely atomic.

> (e) Let (X, , ) be a measure space. Show that is -finite iff there is a totally finite measure on X
with the same measurable sets and the same negligible sets as .

(f ) Show that a point-supported measure is strictly localizable iff it is semi-finite.

211Y Further exercises (a) Let be the countable-cocountable -algebra of R. Show that [0, [ / .
Let be the restriction of counting measure to . Show that (R, , ) is complete, semi-finite and purely
atomic, but not localizable nor locally determined.

(b) Let (X, , ) be a measure space, and for E, F write E F if (E4F ) = 0. Show that is an
equivalence relation on . Let A be the set of equivalence classes in for ; for E , write E A for
its equivalence class. Show that there is a partial ordering on A defined by saying that, for E, F ,
E F (E \ F ) = 0.
Show that is localizable iff for every A A there is an h A such that (i) a h for every a A (ii)
whenever g A is such that a g for every a A, then h g.

(c) Let (X, , ) be a measure space, and construct A as in (b) above. Show that there are operations
, , \ on A defined by saying that
E F = (E F ) ,

E F = (E F ) ,

E \ F = (E \ F )
for all E, F . Show that if A A is any countable set, then there is certainly an h A such that (i)
a h for every a A (ii) whenever g A is such that a g for every a A, then h g. Show that there
is a functional : A [0, ] defined by saying
(E ) = E
for every E .
((A, ) is called the measure algebra of (X, , ).)

(d) Let (X, , ) be a semi-finite measure space. Show that it is atomless iff whenever > 0, E and
E < , then there is a finite partition of E into measurable sets of measure at most .

(e) Let (X, , ) be a strictly localizable measure space. Show that it is atomless iff for every > 0 there
is a decomposition of X consisting of sets of measure at most .

211 Notes and comments The list of definitions in 211A-211K probably strikes you as quite long enough,
even though I have omitted many occasionally useful ideas. The concepts here vary widely in importance,
and the importance of each varies widely with context. My own view is that it is absolutely necessary, when
studying any measure space, to know its classification under the eleven discriminating features listed here,
and to be able to describe any atoms which are present. Fortunately, for most ordinary measure spaces,
the classification is fairly quick, because if (for instance) the space is -finite, and you know the measure of
the whole space, the only remaining questions concern completeness and atoms. The distinctions between
18 Taxonomy of measure spaces 211 Notes

spaces which are, or are not, strictly localizable, semi-finite, localizable and locally determined are relevant
only for spaces which are not -finite, and do not arise in elementary applications.
I think it is also fair to say that the notions of complete and locally determined measure space are
technical; I mean, that they do not correspond to significant features of the essential structure of a space,
though there are some interesting problems concerning incomplete measures. One manifestation of this is the
existence of canonical constructions for rendering spaces complete or complete and locally determined (212C,
213D-213E). In addition, measure spaces which are not semi-finite do not really belong to measure theory,
but rather to the more general study of -algebras and -ideals. The most important classifications, in
terms of the behaviour of a measure space, seem to me to be -finite, localizable and strictly localizable;
these are the critical features which cannot be forced by elementary constructions.
If you know anything about Borel subsets of the real line, the argument of part (a) of the proof of 211P
must look very clumsy. But better proofs rely on ideas which we shall not need until Volume 4, and the
proof here is based on a construction which we have to understand for other reasons.

212 Complete spaces


In the next two sections of this chapter I give brief accounts of the theory of measure spaces possessing
certain of the properties described in 211. I begin with completeness. I give the elementary facts about
complete measure spaces in 212A-212B; then I turn to the notion of completion of a measure (212C) and
its relationships with the other concepts of measure theory introduced so far (212D-212H).

212A Proposition Any measure space constructed by Caratheodorys method is complete.


proof Recall that Caratheodorys method starts from an arbitrary outer measure : PX [0, ] and
sets
= {E : E X, A = (A E) + (A \ E) for every A X}, =
(113C). In this case, if B E and E = 0, then B = E = 0 (113A(ii)), so for any A X we have
(A B) + (A \ B) = (A \ B) A (A B) + (A \ B),
and B .

212B Proposition (a) If (X, , ) is a complete measure space, then any conegligible subset of X is
measurable.
(b) Let (X, , ) be a complete measure space, and f a real-valued function defined on a subset of X. If
f is virtually measurable (that is, there is a conegligible set E X such that f E is measurable), then f
is measurable.
(c) Let (X, , ) be a complete measure space, and f a real-valued function defined on a conegligible
subset of X. Then the following are equiveridical, that is, if one is true so are the others:
(i) f is integrable;
(ii) f is measurable and |f | is integrable;
(iii) f is measurable and there is an integrable function g such that |f | a.e. g (that is, |f | g almost
everywhere).
(d) Let (X, , ) be a complete measure space, Y a set and f : X Y a function. Then the image
measure f 1 (112E) is complete.
proof (a) If E is conegligible, then X \ E is negligible, therefore measurable, and E is measurable.
(b) Let a R. Then there is an H such that
{x : (f E)(x) a} = H dom(f E) = H E dom f .
Now F = {x : x dom f \ E, f (x) a} is a subset of the negligible set X \ E, so is measurable, and
{x : f (x) a} = (F H) dom f dom f ,
212D Complete spaces 19

writing D = {D E : E }, as in 121A. As a is arbitrary, f is measurable.


(c) A real-valued function f on a general measure space (X, , ) is integrable iff () there is a conegligible
set E dom f such that f E is measurable () there is a non-negative integrable function g such that
|f | a.e. g (122P(iii)). But in view of (b), we can in the present context restate () as f is defined a.e.
and measurable; which is the version here. (The shift from non-negative integrable g to integrable g is
trivial, because if g is integrable and |f | a.e. g, then |g| is non-negative and integrable and |f | a.e. |g|.)
(d) If B F Y and (f 1 )(F ) = 0, then f 1 [B] f 1 [F ] and (f 1 [F ]) = 0, so (f 1 )(B) =
(f 1 [B]) = 0.

212C The completion of a measure Let (X, , ) be any measure space.

(a) Let be the family of those sets E X such that there are E 0 , E 00 with E 0 E E 00 and
(E 00 \ E 0 ) = 0. Then is a -algebra of subsets of X. P P (i) Of course belongs to , because we can take
E 0 = E 00 = . (ii) If E , take E 0 , E 00 such that E 0 E E 00 and (E 00 \ E 0 ) = 0. Then
X \ E 00 X \ E X \ E 0 , ((X \ E 0 ) \ (X \ E 00 )) = (E 00 \ E 0 ) = 0,
so X \E . (iii) If hEn inN is a sequence in , then for each n choose En0 , En00 such that En0 En En00
and (En00 \ En0 ) = 0. Set
S S S
E = nN En , E 0 = nN En0 , E 00 = nN En00 ;
S
then E 0 E E 00 and E 00 \ E 0 nN (En00 \ En0 ) is negligible, so E . QQ

(b) For E , set


E = E = inf{F : E F }.
It is worth remarking at once that if E , E 0 , E 00 , E 0 E E 00 and (E 00 \ E 0 ) = 0, then
E 0 = E = E 00 ; this is because
E 0 = E 0 E E 00 = E 00 = E 0 + (E 00 \ E) = E 0
(recalling from 132A, or noting now, that A B whenever A B X, and that agrees with on
).

(c) We now find that (X, , ) is a measure space. PP (i) Of course , like , takes values in [0, ]. (ii)
0
= = 0. (iii) Let hEn inN be a disjoint sequence in , withSunion E. For each S n 00 N choose En ,
00 0 00 00 0 0 0 00
En such that En En En and (En \ En ) = 0. Set E = nN En , E = nN En . Then (as in
(a-iii) above) E 0 E E 00 and (E 00 \ E 0 ) = 0, so
P P
E = E 0 = n=0 En0 = n=0 En
because hEn0 inN , like hEn inN , is disjoint. Q
Q

(d) The measure space (X, , ) is called the completion of the measure space (X, , ); equally, I
will call the completion of , and occasionally (if it is plain which ideal of negligible sets is under
consideration) I will call the completion of . Members of are sometimes called -measurable.

212D There is something I had better check at once.


Proposition Let (X, , ) be any measure space. Then (X, , ), as defined in 212C, is a complete measure
space and is an extension of ; and (X, , ) = (X, , ) iff (X, , ) is complete.
proof (a) Suppose that A E and E = 0. Then (by 212Cb) there is an E 00 such that E E 00
and E 00 = 0. Accordingly we have
A E 00 , (E 00 \ ) = 0,
so A . As A is arbitrary, is complete.
20 Taxonomy of measure spaces 212D

(b) If E , then of course E , because E E E and (E \ E) = 0; and E = E = E. Thus


and extends .
(c) If = then of course must be complete. If is complete, and E , then there are E 0 , E 00
such that E 0 E E 00 and (E 00 \ E 0 ) = 0. But now E \ E 0 E 00 \ E 0 , so (because (X, , ) is complete)
E \ E 0 and E = E 0 (E \ E 0 ) . As E is arbitrary, and = and = .

212E The importance of this construction is such that it is worth spelling out some further elementary
properties.
Proposition Let (X, , ) be a measure space, and (X, , ) its completion.
(a) The outer measures , defined from and coincide.
(b) , give rise to the same negligible and conegligible sets.
(c) is the only measure with domain which agrees with on .
(d) A subset of X belongs to iff it is expressible as F 4A where F and A is -negligible.
proof (a) Take any A X. (i) If A F , then F and F = F , so
A F = F ;
as F is arbitrary, A A. (ii) If A E , there is an E 00 such that E E 00 and E 00 = E, so
A E 00 = E;
as E is arbitrary, A A.
(b) Now, for A X,
A is -negligible A = 0 A = 0 A is -negligible,

A is -conegligible (X \ A) = 0
(X \ A) = 0 A is -conegligible.

(c) If is any measure with domain extending , we must have


E 0 E E 00 , E 0 = E = E 00 ,
so E = E, whenever E 0 , E 00 , E 0 E E 00 and (E 00 \ E 0 ) = 0.
(d)(i) If E , take E 0 , E 00 such that E 0 E E 00 and (E 00 \ E 0 ) = 0. Then E \ E 0 E 00 \ E 0 , so
E \ E 0 is -negligible, and E = E 0 4(E \ E 0 ) is the symmetric difference of a member of and a negligible
set.
(ii) If E = F 4A, where F and A is -negligible, take G such that G = 0 and A G; then
F \ G E F G and ((F G) \ (F \ G)) = G = 0, so E .

212F Now let us consider integration with respect to the completion of a measure.
Proposition Let (X, , ) be a measure space and (X, , ) its completion.
(a) A [, ]-valued function f defined on a subset of X is -measurable
R iff it isR-virtually measurable.
(b) Let f be a [, ]-valued function defined on a subset of X. Then f d = f d if either is defined
in [, ]; in particular, f is -integrable iff it is -integrable.
proof (a) (i) Suppose that f is a [, ]-valued -measurable function. For q Q let Eq be such
that {x : f (x)S q} = dom f Eq , and choose Eq0 , Eq00 such that Eq0 Eq Eq00 and (Eq00 \ Eq0 ) = 0.
Set H = X \ qQ (Eq00 \ Eq0 ); then H is conegligible. For a R set
S
Ga = qQ,q<a Eq0 ;
then
{x : x dom(f H), (f H)(x) < a} = Ga dom(f H).
212G Complete spaces 21

This shows that f H is -measurable, so that f is -virtually measurable.


(ii) If f is -virtually measurable, then there is a -conegligible set H X such that f H is -
measurable. Since , f H is also -measurable. And H is -conegligible, by 212Eb. But this means
that f is -virtually measurable, therefore -measurable, by 212Bb.
R R
(b)(i) Let f : D [, ] be a function, where D X. If either of f d, f d is defined in [, ],
then f is virtually measurable, and defined almost everywhere, for one of the appropriate measures, and
therefore for both (putting (a) above together with 212Bb).
(ii) Now suppose that f is non-negative and integrable either with respect to or with respect to .
Let E be a conegligible set included in dom f such that f E is -measurable. For n N, set
Enk = {x : x E, 2n k f (x) < 2n (k + 1)} if 1 k < 4n ,

En,4n = {x : x E, f (x) 4n };
then each Enk belongs to and is of finite measure for both and . (If f is -integrable,
R
Enk = Enk 2n f d;
if f is -integrable,
R
Enk = Enk 2n f d.)
So
P4n
fn = k=1 2n kEnk
R R
is both -simple and -simple, and fn d = fn d. But hfn inN is a non-decreasing sequence of functions
converging to f at every point of E, that is, both -almost everywhere and -almost everywhere. So we
have, for any c R,
Z Z
f d = c lim fn d = c
n
Z Z
lim fn d = c f d = c.
n

R
(iii) As for infinite integrals, recall that for a non-negative function I write f = just when f is
defined
R almost
R everywhere, is virtually measurable, and is not integrable. So (i) and (ii) together show that
f d = f d whenever f is non-negative and either integral is defined in [0, ].
R R R
(iv) Since both , agree that f is to be interpreted as f + f just when this can be defined
in [, ], writing f + (x) = max(f (x), 0), f (x) = max(f (x), 0) for x dom f , the result for general
real-valued f follows at once.

212G I turn now to the question of the effect of the construction on the other properties listed in 211A.
Proposition Let (X, , ) be a measure space, and (X, , ) its completion.
(a) (X, , ) is a probability space, or totally finite, or -finite, or semi-finite, or localizable, iff (X, , )
is.
(b) (X, , ) is strictly localizable if (X, , ) is, and any decomposition of X for is a decomposition
for .
(c) A set H is an atom for iff there is an E such that E is an atom for and (H4E) = 0.
(d) (X, , ) is atomless or purely atomic iff (X, , ) is.

proof (a)(i) Because X = X, (X, , ) is a probability space, or totally finite, iff (X, , ) is.
) If (X, , ) is -finite, there is a sequence hEn inN , covering X, with En < for each n.
(ii)(
Now En < for each n, so (X, , ) is -finite.
22 Taxonomy of measure spaces 212G

) If (X, , ) is -finite, there is a sequence hEn inN , covering X, with En < for each n. Now
(
we can find, for each n, an En00 such that En00 < and En En00 ; so that hEn00 inN witnesses that
(X, , ) is -finite.
) If (X, , ) is semi-finite and E = , then there is an E 0 such that E 0 E and E 0 = .
(iii)(
Next, there is an F such that F E 0 and 0 < F < . Of course we now have F , F E and
0 < F < . As E is arbitrary, (X, , ) is semi-finite.
( ) If (X, , ) is semi-finite and E = , then E = , so there is an F E such that 0 <
F < . Next, there is an F 0 such that F 0 F and F 0 = F . Of course we now have F 0 E and
0 < F 0 < . As E is arbitrary, (X, , ) is semi-finite.
) If (X, , ) is localizable and E , then set
(iv)(
F = {F : F , E E, F E}.
Let H be an essential supremum of F in , as in 211G.
If E E, there is an E 0 such that E 0 E and E \ E 0 is negligible; now E 0 F , so
(E \ H) (E \ E 0 ) + (E 0 \ H) = 0.
If G and (E \ G) = 0 for every E E, let G00 be such that G G00 and (G00 \ G) = 0; then, for
any F F , there is an E E including F , so that
(F \ G00 ) (E \ G) = 0.
As F is arbitrary, (H \ G00 ) = 0 and (H \ G) = 0. This shows that H is an essential supremum of E in
. As E is arbitrary, (X, , ) is localizable.
) Suppose that (X, , ) is localizable and that E . Working in (X, , ), let H be an essential
(
supremum for E in . Let H 0 be such that H 0 H and (H \ H 0 ) = 0. Then
(E \ H 0 ) (E \ H) + (H \ H 0 ) = 0
for every E E; while if G and (E \ G) = 0 for every E E, we must have
(H 0 \ G) (H \ G) = 0.
Thus H 0 is an essential supremum of E in . As E is arbitrary, (X, , ) is localizable.
(b) Let hXi iiI be a decomposition of X for , as in 211E. Of course it is a disjoint cover of X by sets
of finite -measure. If H X and H Xi for every i, choose for each i I sets Ei0 , Ei00 such that
Ei0 H Xi Ei00 , (Ei00 \ Ei0 ) = 0.
S S
Set E 0 = iI Ei0 , E 00 = iI (Ei00 Xi ). Then E 0 Xi = Ei0 , E 00 Xi = Ei00 Xi for each i, so E 0 and E 00
belong to and
P P
E 0 = iI Ei0 = iI (H Xi ).
Also
P
(E 00 \ E 0 ) = iI (Ei00 Xi \ Ei0 ) = 0.
Consequently H and
P
H = E 0 = iI (H Xi ).
As H is arbitrary, hXi iiI is a decomposition of X for .
Accordingly, (X, , ) is strictly localizable if such a decomposition exists, which is so if (X, , ) is
strictly localizable.
(c)-(d)(i) Suppose that E is an atom for . Let E 0 be such that E 0 E and (E \ E 0 ) = 0.
Then E 0 = E > 0. If F and F E 0 , then F E, so either F = F = 0 or (E 0 \F ) = (E \F ) = 0.
As F is arbitrary, E 0 is an atom for , and (E4E 0 ) = (E \ E 0 ) = 0.
(ii) Suppose that E is an atom for , and that H is such that (H4E) = 0. Then
H = E > 0. If F and F H, let F 0 F be such that F 0 and (F \ F 0 ) = 0. Then E F 0 E
212Xk Complete spaces 23

and (F 4(E F 0 )) = 0, so either F = (E F 0 ) = 0 or (H \ F ) = (E \ F ) = 0. As F is arbitrary, H


is an atom for .
(iii) It follows at once that (X, , ) is atomless iff (X, , ) is.
(iv)( ) On the other hand, if (X, , ) is purely atomic and H > 0, there is an E such that
E H and E > 0, and an atom F for such that F E; but F is also an atom for . As H is arbitrary,
(X, , ) is purely atomic.
( ) And if (X, , ) is purely atomic and E > 0, then there is an H E which is an atom for ;
now let F be such that F H and (H \ F ) = 0, so that F is an atom for and F E. As E is
arbitary, (X, , ) is purely atomic.

212X Basic exercises > (a) Let (X, , ) be a complete measure space. Suppose that A E and
that A + (E \ A) = E < . Show that A .

> (b) Let and be two measures on a set X, with completions and . Show that the following are

equiveridical: (i) the outerR measures
R , defined from and coincide; (ii) E = E whenever either
is defined and finite; (iii) f d = f d whenever f is a real-valued function such that either integral is
defined and finite. (Hint: for (i)(ii), if E < , take a measurable envelope F of E for and calculate
E + (F \ E).)

(c) Let be the restriction of Lebesgue measure to the Borel -algebra of R, as in 211P. Show that its
completion is Lebesgue measure itself. (Hint: 134F.)

(d) Repeat 212Xc for (i) Lebesgue measure on R r (ii) Lebesgue-Stieltjes measures on R (114Xa).

(e) Let X be a set and 1 , 2 two measures on X, with domains 1 , 2 respectively. Let = 1 + 2 be
their sum, with domain = 1 2 (112Xe). Show that if (X, 1 , 1 ) and (X, 2 , 2 ) are complete, so is
(X, , ).

(f ) Let X be a set and a -algebra of subsets of X. Let I be a -ideal of subsets of X (112Db). (i)
Show that 1 = {E4A : E , A I} is a -algebra of subsets of X. (ii) Let 2 be the family of sets
E X such that there are E 0 , E 00 with E 0 E E 00 and E 00 \ E 0 I. Show that 2 is a -algebra of
subsets of X and that 2 1 . (iii) Show that 2 = 1 iff every member of I is included in a member of
I.

(g) Let (X, , ) be a measure space, Y any set and : X Y a function. Set B = 1 [B] for
every B Y . (i) Show that is an outer measure on Y . (ii) Let be the measure defined from by
Caratheodorys method, and T its domain. Show that if C Y and 1 [C] then C T. (iii) Suppose
that (X, , ) is complete and totally finite. Show that is the image measure 1 .

(h) Let X be a set and 1 , 2 two complete measures on X, with domains 1 , 2 . Let be their
sum, with domain 1 2 , as in 212Xe. Show that a real-valued R function
R f defined
R on a subset of X is
-integrable iff it is i -integrable for both i, and in this case f d = f d1 + f d2 . (Compare 212Yd.)

(i) Let g, h be two non-decreasing functions from R to itself, and g , h the associated Lebesgue-Stieltjes
measures. Show that a real-valued function f R defined on Ra subset Rof R is g+h -integrable iff it is both
g -integrable and h -integrable, and that then f dg+h = f dg + f dh . (Hint: 114Yb).
T
(j) Let XPbe a set and hi iiI a family of measures on X; write i for the domain of i . Set = iI i
and E = iI i E for E , as in 112Ya. (i) Show that if every i is complete, so is . (ii) Suppose
that
P everyR i is complete. Show that a real-valued function f defined on a subset of X is -integrable iff
iI |f |di is defined and finite. (Compare 212Ye.)

(k) Let (X, , ) be a measure space, and I a -ideal of subsets of X. (i) Show that 0 = {E A : E
, A I} is a -algebra of subsets of X. (ii) Show that if every member of I is -negligible, then there
is a unique extension of to a measure 0 with domain 0 such that 0 A = 0 for every A I.
24 Taxonomy of measure spaces 212Y

212Y Further exercises (a) Let be the restriction of counting measure to the countable-cocountable
-algebra of R, as in 211Ya. Let be the completion of , the outer measure defined from , and the
measure defined by Caratheodorys method from . Show that = and that = is counting measure
on R, so that 6= .
P P
(b)T Repeat 212Xe for sums of arbitrary families of measures, saying that iI i E = iI i E for
E iI dom i , as in 112Ya, 212Xj.

(c) Let X be a set and an inner measure on X, that is, a functional from X to [0, ] such that
= 0,
(A B) A + B if A B = ,
T
( nN An ) = limn An whenever hAn inN is a non-increasing sequence of subsets of X
and A0 < ;
if A = , a R there is a B A such that a B < .
Let be the measure defined from , that is, = , where
= {E : (A) = (A E) + (A \ E) A X}
(113Yg). Show that must be complete.

(d) Write L for Lebesgue measure on [0, 1], and L for its domain. Set 1 = {E [0, 1] : E L },
2 = {[0, 1] E : E L } and 1 (E [0, 1]) = 2 ([0, 1] E) = L E for every E L . Let =
1 + 2 : 1 2 [0, 2] be Rtheir sum Ras defined in 212Xe and 212Xh above. Show that there is a function
f : [0, 1]2 {0, 1} such that f d1 = f d2 = 0 but f is not -integrable.

(e) Set X = 1 + 1 = 1 {1 } (2A1Dd), and set


I = {E : E 1 is countable}, = I {X \ E : E I}.
For each < 1 Pdefine : {0, 1} by setting E = 1 if E, 0 otherwise. Show that is a measure
on X. Set = <1 , as in 112Ya and 212Xj. Show that {1 } is -integrable, with integral 0, for
every , but is not -integrable.

212 Notes and comments The process of completion is so natural, and so universally applicable, and so
convenient, that over large parts of measure theory it is reasonable to use only complete measure spaces.
Indeed many authors so phrase their definitions that, explicitly or implicitly, only complete measure spaces
are considered. In this treatise I avoid taking quite such a large step, even though it would simplify the
statements of many of the theorems in this volume (for instance). There are non-complete measure spaces
which are worthy of study (for example, the restriction of Lebesgue measure to the Borel -algebra of R
see 211P), and some interesting questions to be dealt with in Volumes 3 and 5 apply to them. At the cost
of rather a lot of verbal manoeuvres, therefore, I prefer to write theorems out in a form in which they can
be applied to arbitrary measure spaces, without assuming completeness. But it would be reasonable, and
indeed would sharpen your technique, if you regularly sought the alternative formulations which become
natural if you are interested only in complete spaces.

213 Semi-finite, locally determined and localizable spaces


In this section I collect a variety of useful facts concerning these types of measure space. I start with
the characteristic properties of semi-finite spaces (213A-213B), and continue with the complete locally
determined spaces (213C) and the concept of c.l.d. version (213D-213H), the most powerful of the univer-
sally available methods of modifying a measure space into a better-behaved one. I briefly discuss locally
determined negligible sets (213I-213L), and measurable envelopes (213L-213M), and end with results on
localizable spaces (213N) and strictly localizable spaces (213O).
213B Semi-finite, locally determined and localizable spaces 25

213A Lemma Let (X, , ) be a semi-finite measure space. Then


E = sup{F : F , F E, F < }
for every E .
proof Set c = sup{F : F , F E, F < }. Then surely c E, so if c = we can stop. If
c < , letShFn inN be a sequence of measurable
S subsets of E, of finite measure, such that limn Fn = c;
set F = nN Fn . For each n N, kn Fk is a measurable set of finite measure included in E, so
S
( kn Fk ) c, and
S
F = limn ( kn Fk ) c.
Also
F supnN Fn c,
so F = c.
If F 0 is a measurable subset of E \ F and F 0 < , then F F 0 has finite measure and is included
in E, so has measure at most c = F ; it follows that F 0 = 0. But this means that (E \ F ) cannot be
infinite, since then, because (X, , ) is semi-finite, it would have to include a measurable set of non-zero
finite measure. So E \ F has finite measure, and is therefore in fact negligible; and E = c, as claimed.

213B Proposition Let (X, , ) be a semi-finite measure space. Let f be a -virtually measurable
[0, ]-valued function defined almost everywhere on X. Then
Z Z
f = sup{ g : g is a simple function, g a.e. f }
Z
= sup f
F ,F < F

in [0, ].
proof (a) For any measure space (X, , ), a [0, ]-valued function defined on a subset of X is integrable
iff there is a conegligible set E such that
() E dom f and f E is measurable,
R
() sup{ g : g is a simple function, g a.e. f } is finite,
() for every > 0, {x : x E, f (x) } has finite measure,
() f is finite almost everywhere
(see 122Ja, 133B). But if is semi-finite, () and () are consequences of the rest. P
P Let > 0. Set
E = {x : x E, f (x) },
R
c = sup{ g : g is a simple function, g a.e. f };
we are supposing that c is finite. If F E is measurable and F < , then F is a simple function and
F a.e. f , so
R
F = F c, F c/.
As F is arbitrary, 213A tells us that E c/ is finite. As is arbitrary, () is satisfied.
As for (), if F = {x : x E, f (x) = } then F is finite (by ()) and nF a.e. f , so nF c, for
every n N, so F = 0. Q Q
(b) Now
R suppose that f : D [0, ] is a -virtually measurable function, where D X is conegligible,
so that f is defined in [0, ] (135F). Then (a) tells us that

Z Z
f= sup g
g is simple,gf a.e.
(if either is finite, and therefore also if either is infinite)
26 Taxonomy of measure spaces 213B
Z Z Z
= sup g sup f f,
g is simple,gf a.e.,F < F F < F

so we have the equalities we seek.

*213C Proposition Let (X, , ) be a complete locally determined measure space, and the outer
measure derived from (132A-132B). Then the measure defined from by Caratheodorys method is
itself.
proof Write for the measure defined by Caratheodorys method from , and for its domain.
(a) If E and A X then (A E) + (A \ E) = A (132Af), so E . Now E = E = E
(132Ac). Thus and = .
(b) Now suppose that H . Let E be such that E < . Then H E . P P Let E1 , E2
be measurable envelopes of E H, E \ H respectively, both included in E (132Ee). Because H ,
E1 + E2 = (E H) + (E \ H) = E = E.
As E1 E2 = E,
(E1 E2 ) = E1 + E2 E = 0.
Now E1 \ (E H) E1 E2 ; because is complete, E1 \ (E H) and E H belong to . Q Q
As E is arbitrary, and is locally determined, H . As H is arbitrary, = and = .

213D C.l.d. versions: Proposition Let (X, , ) be a measure space. Write (X, , ) for its comple-
tion (212C) and f for {E : E , E < }. Set
= {H : H X, H E for every E f },
and for H set
H = sup{(H E) : E f }.
Then (X, , ) is a complete locally determined measure space.
P (i) E = for every E f , so . (ii) if H
proof (a) I check first that is a -algebra. P
then
(X \ H) E = E \ (E H)
f
for every E , so X \ H . (iii) If hHn inN is a sequence in with union H, then
S
H E = nN H Hn
for every E f , so H . Q
Q
(b) It is obvious that = 0. If hHn inN is a disjoint sequence in with union H, then

H = sup{(H E) : E f }

X
X
= sup{ (Hn E) : E f } Hn .
n=0 n=0
P Pm
On the other hand, given a < Pn=0 Hn , there is an m N S such that a < n=0 Hn ; now we can find
m
E0 , . . . , Em f such that a n=0 (Hn En ). Set E = nm En f ; then
P Pm
H (H E) = n=0 (Hn E) n=0 (Hn En ) a.
P P
As a is arbitrary, H n=0 Hn and H = n=0 Hn .
(c) Thus (X, , ) is a measure space. To see that it is semi-finite, note first that (because if
H then surely H E for every E f ), and that H = H whenever H < (because then,
by the definition in 212Ca, there is an E f such that H E, so that H = (H E) = H). Now
213G Semi-finite, locally determined and localizable spaces 27

suppose that H and that H = . There is surely an E f such that (H E) > 0; but now
0 < (H E) < , so 0 < (H E) < .
(d) Thus (X, , ) is a semi-finite measure space. To see that it is locally determined, let H X be
such that H G whenever G and G < . Then, in particular, we must have H E for
every E f . But this means in fact that H E for every E f , so that H . As H is arbitrary,
(X, , ) is locally determined.
(e) To see that (X, , ) is complete, suppose that A H and that H = 0. Then for every E f
we must have (H E) = 0. Because (X, , ) is complete, and A E H E, A E . As E is
arbitrary, A .

213E Definition For any measure space (X, , ), I will call (X, , ), as constructed in 213D, the
c.l.d. version (complete locally determined version) of (X, , ); and will be the c.l.d. version of .

213F Following the same pattern as in 212E-212G, I start with some elementary remarks to facilitate
manipulation of this construction.
Proposition Let (X, , ) be any measure space and (X, , ) its c.l.d. version.
(a) and E = E whenever E and E < in fact, if (X, , ) is the completion of
(X, , ), and E = E whenever E < .
(b) Writing and for the outer measures defined from and respectively, A A for every
A X, with equality if A is finite. In particular, -negligible sets are -negligible; consequently, -
conegligible sets are -conegligible.
(c) For every H there is an E such that E H and E = H; if H < then (H \ E) = 0.
proof (a) This is already covered by remarks in the proof of 213D.
(b) If A = then surely A A. If A < , take E such that A E and E = A
(132Aa). Then
A E = E = A.
On the other hand, if A H , then
H (H E) A = A,
using 212Ea. So A A and A = A.
(c) Write f for {E : E , E < }; then, by the definition in 213D, H = sup{(H E) : E f }.
Let hEn inN be a sequence in f such that H = supnN (H SEn ). For each n N there is an Fn
such that Fn H En and Fn = (H En ) (212C). Set E = nN Fn . Then E , E H and
[
H = sup Fn lim ( Fi ) = E
nN n
in
[
= lim ( Fi ) H,
n
in

so E = H, and if H < then (H \ E) = 0.

213G The next step is to look at functions which are measurable or integrable with respect to .
Proposition Let (X, , ) be a measure space, and (X, , ) its c.l.d. version.
(a) If a real-valued function f defined on a subset of X is -virtually measurable, it is -measurable.
(b) If a real-valued function is -integrable, it is -integrable with the same integral.
(c) If f is a -integrable real-valued function, there is a -integrable real-valued function which is equal
to f -almost everywhere.
proof Write f for {E : E , E < }. By 213Fa, and agree on f .
28 Taxonomy of measure spaces 213G

(a) By 212Fa, f is -measurable, where is the domain of the completion of ; but since , f is
-measurable.
R R
(b)(i) If f is a -simple function it is -simple, and f d = f d, because E = E for every E f .
(ii) If f is a non-negative -integrable function, there is a non-decreasing sequence hfn inN of -simple
functions converging to f -almost everywhere; now (by 213Fb) -negligible sets are -negligible, so hfn inN
converges to f -a.e. and (by B.Levis theorem, 123A) f is -integrable, with
R R R R
f d = limn fn d = limn fn d = f d.
R
(iii) In general, if f d is defined in R, we have
R R R R R R
f d = f + d f d = f + d f d = f d,
+
writing f = f 0, f = (f ) 0.
Pn
(c)(i) Let f be a -simple function. Express it as i=0 ai Hi where Hi < for eachPi. Choose
n
E0 , . . . , En such that Ei R Hi and R(Hi \ Ei ) = 0 for each i (using 213Fc above). Then g = i=0 ai Ei
is -simple, g = f -a.e., and gd = f d.
(ii) Let f be a non-negative -integrable function. Let hfn inN be a non-decreasing sequence of -simple
functions converging -almost everywhere to f . For each n, choose a -simple function gn equal -almost
everywhere to fn . Then {x : gn+1 (x) < gn (x)} belongs to f and is -negligible, therefore -negligible. So
hgn inN is non-decreasing -almost everywhere. Because
R R R
limn gn d = limn fn d = f d,
B.Levis theorem tells us that hgn inN converges -almost everywhere to a -integrable function g; because
-negligible sets are -negligible,

(X \ dom f ) (X \ dom g)
[
{x : fn (x) 6= gn (x)}
nN

{x : x dom f, f (x) 6= sup fn (x)}


nN

{x : x dom g, g(x) 6= sup gn (x)}


nN

is -negligible, and f = g -a.e.


(iii) If f is -integrable, express it as f1 f2 where f1 and f2 are -integrable and non-negative; then
there are -integrable functions g1 , g2 such that f1 = g1 , f2 = g2 -a.e., so that g = g1 g2 is -integrable
and equal to f -a.e.

213H Thirdly, I turn to the effect of the construction here on the other properties being considered in
this chapter.
Proposition Let (X, , ) be a measure space and (X, , ) its completion, (X, , ) its c.l.d. version.
(a) If (X, , ) is a probability space, or totally finite, or -finite, or strictly localizable, so is (X, , ),
and in all these cases = ;
(b) if (X, , ) is localizable, so is (X, , ), and for every H there is an ER suchR that (E4H) = 0;
(c) (X, , ) is semi-finite iff F = F for every F , and in this case f d = f d whenever the
latter is defined in [, ];
(d) a set H is an atom for iff there is an atom E for such that E < and (H4E) = 0;
(e) if (X, , ) is atomless or purely atomic, so is (X, , );
(f) (X, , ) is complete and locally determined iff = .
proof (a)(i) I start by showing that if (X, , ) is strictly localizable, then = . PP Let hXi iiI be a
decomposition of X for ; then it is also a decomposition for (212Gb). If H , we shall have H Xi
for every i, and therefore H ; moreover,
213H Semi-finite, locally determined and localizable spaces 29

X X
H = (H Xi ) = sup{ (H Xi ) : J I is finite}
iI iJ

sup{(H E) : E , E < } = H H.

So H = H for every H and = . Q


Q
(ii) Consequently, if (X, , ) is a probability space, or totally finite, or -finite, or strictly localizable,
so is (X, , ), using 212Ga-212Gb to see that (X, , ) has the property involved.
(b) If (X, , ) is localizable, let H be any subset of . Set
E = {E : E , H H, E H}.
Working in (X, , ), let F be an essential supremum for E.
(i) ?? Suppose, if possible, that there is an H H such that (H \ F ) > 0. Then there is an E
such that E H \ F and E = (H \ F ) > 0 (213Fc). This E belongs to E and (E \ F ) = E > 0; which
is impossible if F is an essential supremum of E. XX
(ii) Thus (H \ F ) = 0 for every H H. Now take any G such that (H \ G) = 0 for every
H H. Let E0 be such that E0 F \ G and E0 = (F \ G); note that F \ E0 F G. If E E,
there is an H H such that E H, so that
(E \ (F \ E0 )) (H \ (F G)) (H \ F ) + (H \ G) = 0.
Because F is an essential supremum for E in ,
0 = (F \ (F \ E0 )) = E0 = (F \ G).
This shows that F is an essential supremum for H in . As H is arbitrary, (X, , ) is localizable.
(iii) The argument of (i)-(ii) shows in fact that if H then H has an essential supremum F in
such that F actually belongs to . Taking H = {H}, we see that if H there is an F such that
(H4F ) = 0.
(c) We already know that E E for every E , with equality if E < , by 213Fa.
(i) If (X, , ) is semi-finite, then for any F we have

F = sup{E : E , E F, E < }
= sup{E : E , E F, E < } F F,
so that F = F .
(ii) Suppose that F = F for every F . If F = , then there must be an E such that
E < , (F E) > 0; in which case F E and 0 < (F E) < . As F is arbitrary, (X, , ) is
semi-finite.
R
(iii) If f is non-negative and f d = , then f is -virtually measurable, therefore -measurable
(213Ga), and defined -almost everywhere, therefore -almost everywhere. Now
Z Z
f d = sup{ g d : g is -simple, 0 g f -a.e.}
Z
sup{ g d : g is -simple, 0 g f -a.e.} =
R R R
by 213B. With 213Gb, this shows that f d = f d whenever f is non-negativeR R and f d is defined in
[0, ]. Applying this to the positive and negative parts of f , we see that f d = f d whenever the latter
is defined in [, ].
(d)(i) If H is an atom for , then (because is semi-finite) there is surely an H 0 such that
H H and 0 < H 0 < , and we must have (H \ H 0 ) = 0, so that H < . Accordingly there is an
0

E such that E H and (H \ E) = 0 (213Fc above). We have E = H > 0. If F and F E,


30 Taxonomy of measure spaces 213H

then either F = F = 0 or (E \ F ) = (H \ F ) = 0. Thus E is an atom for with (H4E) = 0


and E = H < .
(ii) If H and there is an atom E for such that E < and (H4E) = 0, let G be a subset
of H. We have
G H = E < ,
so there is an F such that F G and (G \ F ) = 0. Now either G = (E F ) = 0 or (H \ G) =
(E \ F ) = 0. This is true whenever G and G H; also H = E > 0. So H is an atom for .
(e) If (X, , ) is atomless, then (X, , ) must be atomless, by (d).
If (X, , ) is purely atomic and H , H > 0, then there is an E such that 0 < (H E) < .
Let E1 be such that E1 H E and E1 > 0. There is an atom F for such that F E1 ; now
F < so F is an atom for , by (d). Also F H. As H is arbitrary, (X, , ) is purely atomic.
(f ) If = , then of course (X, , ) must be complete and locally determined, because (X, , ) is. If
(X, , ) is complete and locally determined, then = so (using the definition in 213D) and = ,
by (c) above.

213I Locally determined negligible sets The following simple idea is occasionally useful.
Definition A measure space (X, , ) has locally determined negligible sets if for every non-negligible
A X there is an E such that E < and A E is not negligible.

213J Proposition If a measure space (X, , ) is either strictly localizable or complete and locally
determined, it has locally determined negligible sets.
proof Let A X be a set such that A E is negligible whenever E < ; I need to show that A is
negligible.
(i) If is strictly localizable, let hXi iiI be a decomposition
S of X. For each i I, A P
Xi is negligible, so
there is a negligible Ei such that A Xi Ei . Set E = iI Ei Xi . Then E = iI (Ei Xi ) = 0
and A E, so A is negligible.
(ii) If is complete and locally determined, take any measurable set E of finite measure. Then A E is
negligible, therefore measurable; as E is arbitrary, A is measurable; as is semi-finite, A is negligible.

213K Lemma If a measure space (X, , ) has locally S determined negligible sets, and E has an
essential supremum H in the sense of 211G, then H \ E is negligible.
S
proof Set A = H \ E. Take any F such that F < . Then F A has a measurable envelope V
say (132Ee). If E E, then
(E \ (X \ V )) = (E V ) = (E F A) = 0,
so H V = H \ (X \ V ) is negligible and F A is negligible. As F is arbitrary and has locally determined
negligible sets, A is negligible, as claimed.

213L Proposition Let (X, , ) be a localizable measure space with locally determined negligible sets;
for instance, might be either complete, locally determined and localizable or strictly localizable (213J,
211Ld). Then every subset A of X has a measurable envelope.
proof Set
E = {E : E , (A E) = E < }.
Let G be an essential supremum for E in .
(i) A \ G is negligible. P
P Let F be any set of finite measure for . Let E be a measurable envelope of
A F . Then E E so E \ G is negligible. But F A \ G E \ G, so F A \ G is negligible. Because
has locally determined negligible sets, this is enough to show that A \ G is negligible. Q
Q
213N Semi-finite, locally determined and localizable spaces 31

(ii) Let E0 be a negligible measurable set including A \ G, and set G = E0 G, so that G , A G


and (G \ G) = 0. ?? Suppose, if possible, that there is an F such that (A F ) < (G F ). Let
F1 F be a measurable envelope of A F . Set H = X \ (F \ F1 ); then A H. If E E then
E = (A E) (H E),
so E \ H is negligible; as E is arbitrary, G \ H is negligible and G \ H is negligible. But G F \ F1 G \ H
and
(G F \ F1 ) = (G F ) (A F ) > 0. X
X
This shows that G is a measurable envelope of A, as required.

213M Corollary (a) If (X, , ) is -finite, then every subset of X has a measurable envelope for .
(b) If (X, , ) is localizable, then every subset of X has a measurable envelope for the c.l.d. version of
.
proof (a) Use 132Ee, or 213L and the fact that -finite spaces are strictly localizable (211Lc).
(b) Use 213L and the fact that the c.l.d. version of is localizable as well as being complete and locally
determined (213Hb).

213N When we come to use the concept of localizability, it will frequently be through the following
characterization.
Theorem Let (X, , ) be a localizable measure space. Suppose that is a family of measurable real-
valued functions, all defined on measurable subsets of X, such that whenever f , g then f = g almost
everywhere on dom f dom g. Then there is a measurable function h : X R such that every f agrees
with h almost everywhere on dom f .
proof For q Q, f set
Ef q = {x : x dom f, f (x) q} .
For each q Q, let Eq be an essential supremum of {Ef q : f } in . Set
h (x) = sup{q : q Q, x Eq } [, ]
for x X, taking sup = if necessary.
If f , g and q Q, then

Ef q \ (X \ (dom g \ Egq )) = Ef q dom g \ Egq


{x : x dom f dom g, f (x) 6= g(x)}
is negligible; as f is arbitrary,
Eq dom g \ Egq = Eq \ (X \ (dom g \ Egq ))
S
is negligible. Also Egq \Eq is negligible, so Egq 4(Eq dom g) is negligible. Set Hg = qQ Egq 4(Eq dom g);
then Hg is negligible. But if x dom g \ Hg , then, for every q Q, x Eq x Egq ; it follows that
for such x, h (x) = g(x). Thus h = g almost everywhere on dom g; and this is true for every g .
The function h is not necessarily real-valued. But it is measurable, because
S
{x : h (x) > a} = {Eq : q Q, q > a}
for every real a. So if we modify it by setting

h(x) = h (x) if h(x) R


= 0 if h (x) {, },

we shall get a measurable real-valued function h : X R; and for any g , h(x) will be equal to g(x) at
least whenever h (x) = g(x), which is true for almost every x dom g. Thus h is a suitable function.
32 Taxonomy of measure spaces 213O

213O There is an interesting and useful criterion for a space to be strictly localizable which I introduce
at this point, though it will be used rarely in this volume.
Proposition Let (X, , ) be a complete locally determined space.
(a) Suppose that there is a disjoint family E such that () E < for every E E () whenever
F
S and F > 0 then there is S
an E E such that (E F ) > 0. Then (X, , ) is strictly localizable,
E is conegligible, and E {X \ E} is a decomposition of X.
(b) Suppose that hXi iiI is a partition of X into measurable sets of finite measure such that whenever
E and E > 0 there is an i I such that (E Xi ) > 0. Then (X, , ) is strictly localizable, and
hXi iiI is a decomposition of X.
(a)(i) The first thing to note is that if F and F < , there is a countable E 0 E such that
proof S
(F \ E 0 ) = 0. P
P Set
En0 = {E : E E, (F E) 2n } for each n N,
S
E 0 = nN En0 = {E : E E, (F E) > 0}.
Because E is disjoint, we must have
#(En0 ) 2n F
for every n N, soSthat every En0 is finite and E 0 , being the union of a sequence of countable sets, is
countable. Set E 0 = E 0 and F 0 = F \ E 0 , so that both E 0 and F 0 belong to . If E E 0 , then E E 0 so
(E F 0 ) = = 0; if E E \ E 0 , then (E S
F 0 ) = (E F ) = 0. Thus (E F 0 ) = 0 for every E E.
By the hypothesis () on E, F = 0, so (F \ E 0 ) = 0, as required. Q
0
Q
(ii) Now suppose that H X is such that H E for every E E. In this case H . P PSLet
F be such that F < . Let E 0 E be a countable set such that (F
S \ E 0
) = 0, where E 0
= E 0.
0 0
Then H (F \ E ) because (X, , ) is complete. But also H E = EE 0 H E . So
H F = (H (F \ E 0 )) (F (H E 0 )) .
As F is arbitrary and (X, , ) is locally determined, H . QQ
P
P We find also that H = EE (H 0 E) for every H . P
(iii) P () Because E is disjoint, we must
have EE 0 (H E) H for every finite E E, so
P P 0
EE (H E) = sup{ EE 0 (H E) : E E is finite} H.

() ForSthe reverse inequality, consider first the case H < . By (i), there is a countable E 0 E such that
(H \ E 0 ) = 0, so that
S P P
H = (H E 0 ) = EE 0 (H E) EE (H E).
() In general, because (X, , ) is semi-finite,

H = sup{F : F H, F < }
X X
sup{ (F E) : F H, F < } (H E).
EE EE
P
So in all cases we have H EE (H E), and the two are equal. Q Q
S S
(iv) In particular, setting E0 = X \ E, E0 and E0 = 0; that is, E is conegligible. Consider
E = E {E0 }. This is a disjoint cover of X by sets of finite measure (now using the hypothesis () on E).
If H X is such that H E for every E E , then H and
P P
H = EE (H E) = EE (H E).
Thus E (or, if you prefer, the indexed family hEiEE ) is a decomposition witnessing that (X, , ) is
strictly localizable.
(b) Apply (a) with E = {Xi : i I}, noting that E0 in (iv) is empty, so can be dropped.
213Xh Semi-finite, locally determined and localizable spaces 33

213X Basic exercises (a) Let (X, , ) be any measure space, the outer measure defined from ,
and the measure defined by Caratheodorys method from ; write for the domain of . Show that (i)
extends the completion of ; (ii) if H X is such that H F whenever F and F < ,
then H ; (iii) () = , so that the integrable functions for and are the same (212Xb); (iv) if is
strictly localizable then = ; (v) if is defined by Caratheodorys method from another outer measure,
then = .

> (b) Let be counting measure restricted to the countable-cocountable -algebra of a set X (211R,
211Ya). (i) Show that the c.l.d. version of is just counting measure on X. (ii) Show that , as defined
in 213Xa, is equal to , and in particular strictly extends the completion of .

(c) Let (X, , ) be any measure space. For E set


sf E = sup{(E F ) : F , F < }.
(i) Show that (X, , sf ) is a semi-finite measure space, and is equal to (X, , ) iff (X, , ) is semi-
finite.
(ii) Show that a -integrable real-valued function f is sf -integrable, with the same integral.
(iii) Show that if E and sf E < , then E can be expressed as E1 E2 where E1 , E2 ,
E1 = sf E1 and sf E2 = 0.
(iv) Show that if f is a sf -integrable real-valued function on X, it is equal sf -almost everywhere to
a -integrable function.
(v) Show that if (X, , sf ) is complete, so is (X, , ).
(vi) Show that and sf have identical c.l.d. versions.

(d) Let (X, , ) be any measure space. Define as in 213Xa. Show that ()sf , as constructed in 213Xc,
is precisely the c.l.d. version of , so that = iff is semi-finite.

(e) Let (X, , ) be a measure space. For A X set A = sup{E : E , E < , E A}, as in
113Yh. (i) Show that the measure constructed from by the method of 113Yg is just the c.l.d. version
of . (ii) Show that = . (iii) Show that if is another measure on X, with domain T, then = iff
= .

(f ) Let X be a set and an outer measure on X. Show that sf , defined by writing


sf A = sup{B : B A, B < }
is also an outer measure on X. Show that the measures defined by Caratheodorys method from , sf have
the same domains.

(g) Let (X, , ) be any measure space. Set


sf A = sup{ (A E) : E , E < }
for every A X.
(i) Show that
sf A = sup{ B : B A, B < }
for every A.
(ii) Show that sf is an outer measure.
(iii) Show that if A X and sf A < , there is an E such that sf A = (A E) = E,

sf (A \ E) = 0. (Hint: take a non-decreasing sequence hEn inN of measurable sets of finite measure such
S S
that sf A = limn (A En ), and let E nN En be a measurable envelope of A nN En .)
(iv) Show that the measure defined from sf by Caratheodorys method is precisely the c.l.d. version
of .
(v) Show that sf = , so that if is complete and locally determined then sf = .

>(h)PLet (X, , ) be a strictly localizable measure space with a decomposition hXi iiI . Show that
A = iI (A Xi ) for every A X.

34 Taxonomy of measure spaces 213Xi

> (i) Let (X, , ) be a complete locally determined measure space, and let A X be such that
max( (E A), (E \ A)) < E whenever E and 0 < E < . Show that A . (Hint: given F <
, consider the intersection E of measurable envelopes of F A, F \A to see that (F A)+ (F \A) = F .)

> (j) Let (X, , ) be a measure space, its c.l.d. version, and the measure defined by Caratheodorys
method from . (i) Show that the following are equiveridical: () has locally determined negligible sets;
() and have the same negligible sets; () = . (ii) Show that in this case is semi-finite.

(k) Let (X, , ) be a measure space. Show that the following are equiveridical: (i) (X, , ) has locally
determined negligible sets; (ii) the completion and c.l.d. version of have the same sets of finite measure;
(iii) and have the same integrable functions; (iv) = ; (v) the outer measure sf of 213Xg is equal
to .

(l) Let us say that a measure space (X, , ) has the measurable envelope property if every subset of
X has a measurable envelope. (i) Show that a semi-finite space with the measurable envelope property has
locally determined negligible sets. (ii) Show that a complete semi-finite space with the measurable envelope
property is locally determined.

(m) Let (X, , ) be a semi-finite measure space, and suppose that it satisfies the conclusion of Theorem
213N. Show that it is localizable. (Hint: given E , set F = {F : F , E F is negligible for
every E E}. Let be the set of functions f from subsets of X to {0, 1} such that f 1 [{1}] E and
f 1 [{0}] F.)

(n) Let (X, , ) be a measure space. Show that its c.l.d. version is strictly localizable iff there is a disjoint
family E such that E < for every E E and whenever F , 0 < F < there is an E E such
that (E F ) > 0.

(o) Show that the c.l.d. version of any point-supported measure is point-supported.

213Y Further exercises (a) Set X = N, and for A X set


p
A = #(A) if A is finite, if A is infinite.
Show that is an outer measure on X, that A = sup{B : B A, B < } for every A X, but that
the measure defined from by Caratheodorys method is not semi-finite. Show that if is the measure
defined by Caratheodorys method from (213Xa), then 6= .

(b) Set X = [0, 1] {0, 1}, and let be the family of those subsets E of X such that
{x : x [0, 1], E[{x}] 6= , E[{x}] 6= {0, 1}}
is countable, writing E[{x}] = {y : (x, y) E} for each x [0, 1]. Show that is a -algebra of subsets
of X. For E , set E = #({x : (x, 1) E}) if this is finite, otherwise. Show that is a complete
semi-finite measure. Show that the measure defined from by Caratheodorys method (213Xa) is not
semi-finite. Show that the domain of the c.l.d. version of is the whole of PX.

(c) Set X = N, and for A X set


A = #(A)2 if A is finite, if A is infinite.
Show that satisfies the conditions of 113Yg, but that the measure defined from by the method of 113Yg
is not semi-finite.

(d) Let (X, , ) be a complete locally determined measure space. Suppose that D X and that
f : D R is a function. Show that the following are equiveridical: (i) f is measurable; (ii)
{x : x D E, f (x) a} + {x : x D E, f (x) b} E
whenever a < b in R, E and E < (iii)
max( {x : x D E, f (x) a}, {x : x D E, f (x) b}) < E
213 Notes Semi-finite, locally determined and localizable spaces 35

whenever a < b in R and 0 < E < . (Hint: for (iii)(i), show that if E X then
{x : x D E, f (x) > a} = supb>a {x : x D E, f (x) b},
and use 213Xi above.)

(e) Let (X, , ) be a complete locally determined measure space and suppose that E is suchSthat
E <S for every E E and whenever F and F < there is a countable E0 E such that F \ E0 ,
F (E \ E0 ) are negligible. Show that (X, , ) is strictly localizable.

213 Notes and comments I think it is fair to say that if the definition of measure space were re-written
to exclude all spaces which are not semi-finite, nothing significant would be lost from the theory. There are
solid reasons for not taking such a drastic step, starting with the fact that it would confuse everyone (if you
say to an unprepared audience let (X, , ) be a measure space, there is a danger that some will imagine
that you mean -finite measure space, but very few will suppose that you mean semi-finite measure space).
But the whole point of measure theory is that we distinguish between sets by their measures, and if every
subset of E is either non-measurable, or negligible, or of infinite measure, the classification is too crude to
support most of the usual ideas, starting, of course, with ordinary integration.
Let us say that a measurable set E is purely infinite if E itself and all its non-negligible measurable
subsets have infinite measure. On the definition of the integral which I chose in Volume 1, every simple
function, and therefore every integrable function, must be zero almost everywhere on E. This means that the
whole theory of integration will ignore E entirely. Looking at the definition of c.l.d. version (213D-213E),
you will see that the c.l.d. version of the measure will render E negligible, as does the semi-finite version
described in 213Xc. These amendments do not, however, affect sets of finite measure, and consequently
leave integrable functions integrable, with the same integrals.
The strongest reason we have yet seen for admitting non-semi-finite spaces into consideration is that
Caratheodorys method does not always produce semi-finite spaces. (I give examples in 213Ya-213Yb; more
important ones are the Hausdorff measures of 264-265 below.) In practice the right thing to do is often
to take the c.l.d. version of the measure produced by Caratheodorys construction.
It is a reasonable general philosophy, in measure theory, to say that we wish to measure as many sets,
and integrate as many functions, as we can manage in a canonical way I mean, without making blatantly
arbitrary choices about the values we assign to our measure or integral. The revision of a measure to its
c.l.d. version is about as far as we can go with an arbitrary measure space in which we have no other
structure to guide our choices.
You will observe that is not as close to as the completion of is; naturally so, because if E
is purely infinite for then we have to choose between setting E = 0 6= E and finding some way of
fitting many sets of finite measure into E; which if E is a singleton will be actually impossible, and in any
case would be an arbitrary process. However the integrable functions for , while not always the same
as those for (since turns purely infinite sets into negligible ones, so that their characteristic functions
become integrable), are nearly the same, in the sense that any -integrable function can be changed into a
-integrable function by adjusting it on a -negligible set. This corresponds, of course, to the fact that any
set of finite measure for is the symmetric difference of a set of finite measure for and a -negligible set.
For sets of infinite measure this can fail, unless is localizable (213Hb, 213Xb).
If (X, , ) is semi-finite, or localizable, or strictly localizable, then of course it is correspondingly closer
to (X, , ), as detailed in 213Ha-c.
It is worth noting that while the measure obtained by Caratheodorys method directly from the outer
measure defined from may fail to be semi-finite, even when is (213Yb), a simple modification of
(213Xg) yields the c.l.d. version of , which can also be obtained from an appropriate inner measure
(213Xe). The measure is of course related in other ways to ; see 213Xd.
36 Taxonomy of measure spaces 214 intro.

214 Subspaces
In 131 I described a construction for subspace measures on measurable subsets. It is now time to give
the generalization to subspace measures on arbitrary subsets of a measure space. The relationship between
this construction and the properties listed in 211 is not quite as straightforward as one might imagine, and
in this section I try to give a full account of what can be expected of subspaces in general. I think that
for the present volume only (i) general subspaces of -finite spaces and (ii) measurable subspaces of general
measure spaces will be needed in any essential way, and these do not give any difficulty; but in later volumes
we shall need the full theory.
I begin with a general construction for subspace measures (214A-214C), with an account of integration
with respect to a subspace measure (214E-214G); these (with 131E-131H) give a solid foundation for the
concept of integration over a subset (214D). I give this work in its full natural generality, which will
eventually be essential, but even for Lebesgue measure alone it is important to be aware of the ideas here.
I continue with answers to some obvious questions concerning subspace measures and the properties of
measure spaces so far considered, both for general subspaces (214I) and for measurable subspaces (214J),
and I mention a basic construction for assembling measure spaces side-by-side, the direct sums of 214K-
214L.

214A Proposition Let (X, , ) be a measure space, and Y any subset of X. Let be the outer
measure defined from (132A-132B), and set Y = {E Y : E }; let Y be the restriction of to Y .
Then (Y, Y , Y ) is a measure space.
proof (a) I have noted in 121A that Y is a -algebra of subsets of Y .
(b) Of course Y F [0, ] for every F Y .
(c) Y = = 0.
(d) If hFn inN is a disjoint sequence in Y with union F , then choose En , En0 , E such that
Fn = Y En for each n, Fn En0S , Y Fn = En0 for each n, F E and Y F = E (using 132Aa
0 0
repeatedly). Set Gn = En En E \ m<n S Em for each n N; then hGn inN is disjoint and Fn Gn En
for each n, so Y Fn = Gn . Also F nN Gn E, so
S P P
Y F = ( nN Gn ) = n=0 Gn = n=0 Y Fn .
As hFn inN is arbitrary, Y is a measure.

214B Definition If (X, , ) is any measure space and Y is any subset of X, then Y , defined as in
214A, is the subspace measure on Y .
It is worth noting the following.

214C Lemma Let (X, , ) be a measure space, Y a subset of X, and Y the subspace measure on Y ,
with domain Y . Then
(a) for any F Y , there is an E such that F = E Y and E = Y F ;
(b) for any A Y , A is Y -negligible iff it is -negligible;
(c)(i) if A X is -conegligible, then A Y is Y -conegligible (ii) if A Y is Y -conegligible, then
A (X \ Y ) is -conegligible;
(d) (Y ) , the outer measure on Y defined from Y , agrees with on PY ;
(e) if Z Y X, then Z = (Y )Z , the subspace -algebra of subsets of Z regarded as a subspace of
(Y, Y ), and Z = (Y )Z is the subspace measure on Z regarded as a subspace of (Y, Y );
(f) if Y , then Y , as defined here, is exactly the subspace measure on Y defined in 131A-131B; that
is, Y = PY and Y = Y .
proof (a) By the definition of Y , there is an E0 such that F = E0 Y . By 132Aa, there is an E1
such that F E1 and F = E1 . Set E = E0 E1 ; this serves.
214E Subspaces 37

(b)(i) If A is Y -negligible, there is a set F Y such that A F and Y F = 0; now A F = 0


so A is -negligible, by 132Ad. (ii) If A is -negligible, there is an E such that A E and E = 0;
now A E Y Y and Y (E Y ) = 0, so A is Y -negligible.
(c) If A X is -conegligible, then A Y is Y -conegligible, because Y \ A = Y (X \ A) is -
negligible, therefore Y -negligible. If A Y is Y -conegligible, then A (X \ Y ) is -conegligible because
X \ (A (X \ Y )) = Y \ A is Y -negligible, therefore -negligible.
(d) Let A Y . (i) If A E , then A E Y Y , so Y A Y (E Y ) E; as E is arbitrary,
YA A. (ii) If A F Y , there is an E such that F E and Y F = F = E; now A E
so A E = Y F . As F is arbitrary, A Y A.
(e) That Z = (Y )Z is immediate from the definition of Y , etc.; now
(Y )Z = Y Z = Z = Z
by (d).
(f ) This is elementary, because E Y and (E Y ) = (E Y ) for every E .

214D Integration over subsets: Definition Let (X, , ) be R a measure


R space, Y a subset of X
and
R f a [, ]-valued function defined on a subset of X. By Y
f (or Y
f (x)(dx), etc.) I mean
(f Y )dY , if this exists in [, ], following the definitions of 214A-214B, 133A and 135F, and taking
dom(f Y ) = Y dom f , (f Y )(x) = f (x) for x Y dom f . (Compare 131D.)

214E Proposition Let (X, , ) be a measure space, Y X, and f a [, ]-valued function defined
on a subset dom f of X. R R
(a) If f is -integrable, f Y is Y -integrable, and Y f f if f is non-negative.
(b)R If domRf Y and f is Y -integrable, then there is a -integrable function f on X, extending f , such
that F f = F Y f for every F .
Pn
proof (a)(i) If f is -simple, it isP expressible as i=0 ai Ei , where E0 , . . . , En , a0 , . . . , an R and
n
Ei < for each i. Now f Y = i=0 ai Y (Ei Y ), where Y (Ei Y ) = (Ei ) Y is the characteristic
function of Ei Y regarded as a subset of Y ; and each Ei Y belongs to Y , with Y (Ei Y ) Ei < ,
so f Y : Y R is Y -simple. Pn
If f : X R is a non-negative simple function, P it is expressible as i=0 ai Ei where E0 , . . . , En are
n
disjoint sets of finite measure (122Cb). Now f Y = i=0 ai Y (Ei Y ) and
R Pn Pn R
(f Y )dY = i=0 ai Y (Ei Y ) i=0 ai Ei = f d
because ai 0 whenever Ei 6= , so that ai Y (Ei Y ) ai Ei for every i.
(ii) If f is a non-negative -integrable function, there is an increasing sequence hfn inN of non-negative
-simple functions converging to f -almost everywhere; now hfn Y inN is an increasing sequence of Y -
simple functions increasing to f Y Y -a.e. (by 214Cb), and
R R R
supnN (fn Y )dY supnN fn d = f d < ,
R R
so (f Y )dY exists and is at most f d.
(iii) Finally, if f is any -integrable real-valued function, it is expressible as f1 f2 where f1 and f2
are non-negative -integrable functions, so that f Y = (f1 Y ) (f2 Y ) is Y -integrable.
(b) Let us say that if f is a Y -integrable function, then
R an enveloping
R extension of f is a -integrable
function f, extending f , real-valued on X \ Y , such that F f = F Y f for every F .
(i) If f is of the form H, where H Y and Y H < , let E0 be such that H = Y E0
and E1 a measurable envelope for H (132Ee); then E = E0 E1 is a measurable envelope for H and
H = E Y . Set f = E, regarded as a function from X to {0, 1}. Then f Y = f , and for any F we
have
R R
F
f = F (E F ) = (E F ) = (H F ) = Y F (H F ) = f. Y F
38 Taxonomy of measure spaces 214E

So f is an enveloping extension of f .
(ii) If f , g are Y -integrable functions with enveloping extensions f, g, and a, b R, then af + bg
extends af + bg and
Z Z Z

af + bg = a
f +b g
F
ZF F
Z Z
=a f +b g= af + bg
F Y F Y F Y

for every F , so af + bg is an enveloping extension of af + bg.


(iii) Putting (i) and (ii) together, we see that every Y -simple function f has an enveloping extension.
(iv) Now suppose that hfn inN is a non-decreasing sequence of non-negative Y -simple functions
converging Y -almost everywhere to a Y -integrable function f . For each n N let fn be an enveloping
extension of fn . Then fn a.e. fn+1 . P
P If F then
R R R R
F
fn = F Y
fn F Y
fn+1 = F
fn+1 .

So fn a.e. fn+1 , by 131Ha. Q


Q Also
R R R
limn F
fn = limn F Y
fn = F Y
f

for every F . Taking F = X to begin with, B.Levis theorem tells us that


R h = lim
R n fn is defined (as
a real-valued function) -almost everywhere; now letting F vary, we have F h = F Y f for every F ,
because hF = limn fn F F -a.e. (I seem to be using 214Cb here.) Now h Y = f Y -a.e., by 214Cb
again. If we define f by setting
f(x) = f (x) for x dom f , h(x) for x dom h \ dom f , 0 for other x X,
then f is defined everywhere in X and is equal to h -almost everywhere; so that if F , fF will be
equal to hF F -almost everywhere, and
R R R
=
f h = f.
F F F Y

As F is arbitrary, f is an enveloping extension of f .


(v) Thus every non-negative Y -integrable function has an enveloping extension. Using (ii) again,
every Y -integrable function has an enveloping extension, as claimed.

214F Proposition
R Let (X, , ) be a measure space, Y a subset of X, and f a [, ]-valued function
such that X f isR defined in [, ]. If eitherR Y is of full outer measure in X or f is zero almost everywhere
on X \ Y , then Y f is defined and equal to X f .
proof (a) Suppose first that f is non-negative, -measurable and defined everywhere on X. In this case
P4n
f Y is Y -measurable. Set Fnk = {x : x X, f (x) 2n k} for k, n N, fn = k=1 2n Fnk for n N,
so that Rhfn inN is a non-decreasing
R sequence of real-valued measurable functions converging everywhere to
f , and X f = limn X fn . For each n N, k 1,
Y (Fnk Y ) = (Fnk Y ) = Fnk
either because Fnk \ Y is negligible or because X is a measurable envelope of Y . So

Z Z 4
X
n

f = lim fn = lim 2n Y (Fnk Y )


Y n Y n
k=1
4n
X Z Z
= lim 2n Fnk = lim fn = f.
n n X X
k=1
214H Subspaces 39

(b) Now suppose that f is non-negative, defined almost everywhere on X and -virtually measurable. In
this case there is a conegligible measurable set E dom f such that f E is measurable. Set f(x) = f (x) for
x E, 0 for x X \ E; then f satisfies the conditions of (a) and f = f -a.e. Accordingly f Y = f Y Y -
a.e. (214Cc), and
R R R R
Y
f= Y
f = X
f = X
f.

(c) Finally, for the general case, we can apply (b) to the positive and negative parts f + , f of f to get
R R R R R R
Y
f= Y
f+ Y
f = X
f+ X
f = X
f.

214G Corollary Let (X, , ) be a measure space,


R Y a subset of X, and E R a measurable envelope
of YR . If f is a [, ]-valued function such that E f is defined in [, ], then Y f is defined and equal
to E f .
proof By 214Ce, we can identify the subspace measure Y with the subspace measure (E )Y induced by
the subspace measure on E. Now, regarded as a subspace of E, Y is of full outer measure, so 214F gives
the result.

214H Subspaces and Caratheodorys method The following technical results will occasionally be
useful.
Lemma Let X be a set, Y X a subset, and an outer measure on X.
(a) Y = PY is an outer measure on Y .
(b) Let , be the measures on X, Y defined by Caratheodorys method from the outer measures , Y ,
and , T their domains; let Y be the subspace measure on Y induced by , and Y its domain. Then
(i) Y T and F Y F for every F Y ;
(ii) if Y then = Y ;
(iii) if = (that is, is regular) then extends Y ;
(iv) if = and Y < then = Y .
proof (a) You have only to read the definition of outer measure (113A).
(b)(i) Suppose that F Y . Then it is of the form E Y where E . If A Y , then
Y (A F ) + Y (A \ F ) = (A F ) + (A \ F ) = (A E) + (A \ E) = A = Y A,
so F T. Now
F = Y F = F F = Y F .

(ii) Suppose that F T. If A X, then

A = (A Y ) + (A \ Y ) = Y (A Y ) + (A \ Y )
= Y (A Y F ) + Y (A Y \ F ) + (A \ Y )
= (A F ) + (A Y \ F ) + (A \ Y )
= (A F ) + ((A \ F ) Y ) + ((A \ F ) \ Y ) = (A F ) + (A \ F );
as A is arbitrary, F and therefore F Y . Also
Y F = F = F = Y F = F .
Putting this together with (i), we see that Y and are identical.
(iii) Let F Y . Then F T, by (i). Now F = F = F = Y F . As F is arbitrary, extends Y .
(iv) Now suppose that F T. Because Y = Y < , we have measurable envelopes E1 , E2 of F
and Y \ F for (132Ee). Then

Y = Y Y = Y F + Y (Y \ F ) = F + (Y \ F )
= F + (Y \ F ) = E1 + E2 (E1 E2 ) = (E1 E2 ) Y,
40 Taxonomy of measure spaces 214H

so E1 + E2 = (E1 E2 ) and
(E1 E2 ) = E1 + E2 (E1 E2 ) = 0.
As is complete (212A) and E1 Y \ F E1 E2 is -negligible, therefore belongs to , F = Y (E1 \
(E1 Y \ F )) belongs to Y . Thus T Y ; putting this together with (iii), we see that = Y .

214I I now turn to the relationships between subspace measures and the classification of measure spaces
developed in this chapter.
Theorem Let (X, , ) be a measure space and Y a subset of X. Let Y be the subspace measure on Y
and Y its domain.
(a) If (X, , ) is complete, or totally finite, or -finite, or strictly localizable, so is (Y, Y , Y ). If hXi iiI
is a decomposition of X for , then hXi Y iiI is a decomposition of Y for Y .
(b) If (X, , ) has locally determined negligible sets, then Y is semi-finite.
(c) If (X, , ) is complete and locally determined, then (Y, Y , Y ) is complete and semi-finite.
(d) If (X, , ) is complete, locally determined and localizable then so is (Y, Y , Y ).
proof (a)(i) Suppose that (X, , ) is complete. If A U Y and Y U = 0, there is an E such that
U = E Y and E = Y U = 0; now A E so A and A = A Y Y .
(ii) Y Y = Y X, so if is totally finite so is Y .
(iii) If hXn inN is a sequence of sets of finite measure for which covers X, then hXn Y inN is a
sequence of sets of finite measure for Y which covers Y . So (Y, Y , Y ) is -finite if (X, , ) is.
(iv) Suppose that hXi iiI is a decomposition of X for . Then hXi Y iiI is a decomposition of Y
for Y . P
P Because Y (Xi Y ) Xi < for each i, hXi Y iiI is a disjoint cover of Y by sets of finite
measure. Suppose that U Y is such that Ui = U Xi Y Y for every i. For each i I, S choose Ei
such that Ui = Ei Y and Ei = Y Ui ; we may P of course suppose that Ei Xi . Set E = iI Ei . Then
E Xi = Ei for every i, so E and E = iI Ei . Now U = E Y so U Y and
P P
Y U E = iI Ei = iI Y Ui .
P P
On the other hand, Y U is surely greater than or equal to iI Y Ui = supJI is finite iJ Y Ui , so they
are equal. As U is arbitrary, hXi Y iiI is a decomposition of Y for Y . Q Q
Consequently (Y, Y , Y ) is strictly localizable if (X, , ) is.
(b) Take U Y such that Y U > 0. Then there is an E such that E < and (E U ) > 0.
P
P?? Otherwise, E U is -negligible whenever E < ; because has locally determined negligible sets,
U is -negligible and Y U = U = 0. X
XQQ Now E U Y and
0 < (E U ) = Y (E U ) E < .

(c) By (a), Y is complete; by (b) and 213J, it is semi-finite.


(d) By (c), Y is complete and semi-finite. To see that it is locally determined, take any U Y such
that U V Y whenever V Y and Y V < . By 213L, there is a measurable envelope E of U for ;
of course E Y Y .
I claim that (E Y \ U ) = 0. P
P Take any F with F < . Then F U Y , so
Y (F E Y ) (F E) = (F U ) = Y (F U ) Y (F E Y );
thus Y (F E Y ) = Y (F U ) and
(F E Y \ U ) = Y (F E Y \ U ) = 0.
Because is complete, (F EY \U ) = 0; because is locally determined and F is arbitrary, (EY \U ) =
0. Q
Q But this means that E Y \ U Y and U Y . As U is arbitrary, Y is locally determined.
To see that Y is localizable, let U be any family in Y . Set
E = {E : E , E < , E = (E U ) for some U U},
214L Subspaces 41

and let G be an essential supremum for E in . I claim that G Y is an essential supremum for U in
Y . PP (i) ?? If U U and U \ (G Y ) is not negligible, then (because Y is semi-finite) there is a V Y
such that V U \ G and 0 < Y V < . Now there is an E such that V E and E = V . We
have (E U ) V = E, so E E and E \ G must be negligible; but V E \ G is not negligible. X X
Thus U \ (G Y ) is negligible for every U U. (ii) If W Y is such that U \ W is negligible for every
U U, express W as H Y where H . If E E, there is a U U such that E = (E U ); now
(E U \ W ) = 0, so E = (E U W ) (E H) and E \ H is negligible. As E is arbitrary, H is
an essential upper bound for E and G \ H is negligible; but this means that G Y \ W is negligible. As W
is arbitrary, G Y is an essential supremum for U. Q Q
As U is arbitrary, Y is localizable.

214J Measurable subspaces: Proposition Let (X, , ) be a measure space.


(a) Let E and let E be the subspace measure, with E its domain. If (X, , ) is complete, or totally
finite, or -finite, or strictly localizable, or semi-finite, or localizable, or locally determined, or atomless, or
purely atomic, so is (E, E , E ).
(b) Suppose that hXi iiI is a disjoint cover of X by measurable sets (not necessarily of finite measure)
such that
= {E : E X, E Xi i I},
P
E = iI (E Xi ) for every E .
Then (X, , ) is complete, or strictly localizable, or semi-finite, or localizable, or locally determined, or
atomless, or purely atomic, iff (Xi , Xi , Xi ) has that property for every i I.
proof I really think that if you have read attentively up to this point, you ought to find this easy. If you
are in any doubt, this makes a very suitable set of sixteen exercises to do.
S
214K Direct sums Let h(Xi , i , i )iiI be any indexed family of measure spaces. Set X = iI (Xi
{i}); for E X, i I set Ei = {x : (x, i) E}. Write
= {E : E X, Ei i i I},
P
E = iI i Ei for every E .
Then it is easy to check that (X, , ) is a measure space; I will call it the direct sum of the family
h(Xi , i , i )iiI . Note that if (X, , ) is any decomposable measure space, with decomposition
L hXi iiI ,
then we have a natural isomorphism between (X, , ) and the direct sum (X 0 , 0 , 0 ) = iI (Xi , Xi , Xi )
of the subspace measures, if we match (x, i) X 0 with x X for every i I, x Xi .
For some of the elementary properties (to put it plainly, I know of no properties which are not elementary)
of direct sums, see 214L and 214Xi-214Xl.

214L Proposition Let h(Xi , i , i )iiI be a family of measure spaces, with direct sum (X, , ). Let f
be a real-valued function defined on a subset of X. For each i I, set fi (x) = f (x, i) whenever (x, i) dom f .
(a) f is measurable iff fi is measurable
R Pfor every
R i I.
(b) If f is non-negative, then f d = iI fi di if either is defined in [0, ].
proof (a) For a R, set Fa = {(x, i) : (x, i) dom f, f (x, i) a}. (i) If f is measurable, i I and a R,
then there is an E such that Fa = E dom f ; now
{x : fi (x) a} = dom fi {x : (x, i) E}
belongs to the subspace -algebra on dom fi induced by i . As a is arbitrary, fi is measurable. (ii) If every
fi is measurable and a R, then for each i I there is an Ei i such that {x : (x, i) Fa } = Ei dom f ;
setting E = {(x, i) : i I, x Ei }, Fa = dom f E belongs to the subspace -algebra on dom f . As a is
arbitrary, f is measurable.
(b)(i) Suppose first that f is measurable and defined everywhere. Set Fnk = {(x, i) : (x, i) X, f (x, i)
P 4n
2 k} for k, n N, gn = k=1 2n Fnk for n N, Fnki = {x : (x, i) Fnk } for k, n N and i I,
n

gni (x) = gn (x, i) for i I, x Xi . Then


42 Taxonomy of measure spaces 214L

Z Z 4
X
n

f d = lim gn d = sup 2n Fnk


n nN
k=1
n
4 X
X
= sup 2n Fnki
nN
k=1 iI
n
X 4
X
= sup 2n Fnki
nN
iI k=1
X Z XZ
= sup gni di = fi di .
nN
iI iI

R
(ii) Generally, if f d is defined, there are a measurable g : X [0, [ and a conegligible measurable
set
P E dom f such that g = f on E. Now Ei = {x : (x, i) Xi } belongs to i for each i, and
iI i (Xi \ Ei ) = (X \ E) = 0, so Ei is i -conegligible for every i. Setting gi (x) = g(x, i) for x Xi , (i)
tells us that
P R P R R R
iI fi di = iI gi di = g d = f d.
R
(iii) On the other hand, if fi di is defined for every i I, then for each i I we can find a measurable
function gi : Xi [0, [ and a i -conegligible measurable set Ei dom fi such that gi = fi on Ei . Setting
g(x, i) = gi (x) for i I, x Xi , (a) tells us that g is measurable, while g = f on {(x, i) : i I, x Ei },
which is conegligible (by the calculation in (ii) just above); so
R R P R P R
f d = g d = iI gi di = iI fi di ,
again using (i) for the middle step.

214M Corollary Let (X, , ) be a measure space with a decomposition hXi iiI . If f is a real-valued
function defined on a subset of X, then
(a) f is measurable
R iff f X
P i isRmeasurable for every i I,
(b) if f 0, then f = iI Xi f if either is defined in [0, ].
proof Apply 214L to the direct sum of h(Xi , Xi , Xi )iiI , identified with (X, , ) as in 214K.

214X Basic exercises (a) Let (X, , ) be a localizable measure space. Show that there is an E
such that the subspace measure E is purely atomic and X\E is atomless.

(b) Let (X, , ) be a measure space, and let be the completion of . Show that for any Y X, the
subspace measure Y on Y defined from is equal to the completion of the subspace measure Y . (Cf.
214Cb.)

(c) Let X be a set, a regular outer measure on X, and Y a subset of X. Let be the measure on X
defined by Caratheodorys method from , Y the subspace measure on Y , and the measure on Y defined
by Caratheodorys method from PY . (i) Show that is an extension of Y . (ii) Show that if F dom
and F < then F Y . (iii) Show that if Y is locally determined (in particular, if is either strictly
localizable or complete, locally determined and localizable) then = Y .

(d) Let (X, , ) be a localizable measure space, and Y a subset of X such that the subspace measure
Y is semi-finite. Show that Y is localizable.

(e) Let (X, , ) be a measure space, and Y a subset of X such that the subspace measure Y is semi-finite.
Show that if is atomless or purely atomic, so is Y .

(f ) Let (X, , ) be a localizable measure space, and Y any subset of X. Show that the c.l.d. version of
the subspace measure on Y is localizable.
215A -finite spaces and the principle of exhaustion 43

(g) Let (X, , ) be a measure space with locally determined negligible sets in the sense of 213I, and Y
a subset of X, with its subspace measure Y .
(i) Show that Y has locally determined negligible sets; in particular, it is semi-finite.
(ii) Show that if is localizable, so is Y .
(iii) Show that a set U Y is an atom for Y iff it is expressible as F Y where F is an atom for
and (F Y ) > 0.
(iv) Show that if is purely atomic, so is Y .
(v) Show that if is atomless, so is Y .

> (h) Let (X, , ) be a measure space. Show that (X, , ) has locally determined negligible sets iff the
subspace measure Y is semi-finite for every Y X.

> (i) Let h(Xi , i , i )iiI be a family of measure spaces, with direct sum (X, , ) (214K). Set Xi0 =
Xi {i} X for each i I. Show that Xi0 , with the subspace measure, is isomorphic to (Xi , i , i ). Under
what circumstances is hXi0 iiI a decomposition of X? Show that is complete, or strictly localizable, or
localizable, or locally determined, or semi-finite, or atomless, or purely atomic iff every i is. Show that a
measure space is strictly localizable iff it is isomorphic to a direct sum of totally finite spaces.

> (j) Let h(Xi , i , i )iiI be a family of measure spaces, and (X, , ) their direct sum. Show that the
completion of (X, , ) can be identified with the direct sum of the completions of the (Xi , i , i ), and that
the c.l.d. version of (X, , ) can be identified with the direct sum of the c.l.d. versions of the (Xi , i , i ).

(k) Let h(Xi , i , i )iiI be a family of measure spaces. Show that their direct sum has locally determined
negligible sets iff every i has.

(l) Let h(Xi , i , i )iiI be a family of measure spaces, and (X, , ) their direct sum. Show that (X, , )
has the measurable envelope property (213Xl) iff every (Xi , i , i ) has.

214 Notes and comments I take the first part of the section, down to 214H, slowly and carefully, because
while none of the arguments are deep (214Eb is the longest) the patterns formed by the results are not
always easy to predict. There is a counter-example to a tempting extension of 214H/214Xc in 216Xb.
The message of the second part of the section (214I-214K) is that subspaces inherit many, but not all,
of the properties of a measure space; and in particular there is a difficulty with semi-finiteness, unless we
have locally determined negligible sets (214Xh). (I give an example in 216Xa.) Of course 213Hb shows that
if we start with a localizable space, we can convert it into a complete locally determined localizable space
without doing great violence to the structure of the space, so the difficulty is ordinarily superable.

215 -finite spaces and the principle of exhaustion


I interpolate a short section to deal with some useful facts which might get lost if buried in one of the
longer sections of this chapter. The great majority of the applications of measure theory involve -finite
spaces, to the point that many authors skim over any others. I myself prefer to signal the importance of
such concepts by explicitly stating just which theorems apply only to the restricted class of spaces. But
undoubtedly some facts about -finite spaces need to be grasped early on. In 215B I give a list of properties
characterizing -finite spaces. Some of these make better sense in the light of the principle of exhaustion
(215A). I take the opportunity to include a fundamental fact about atomless measure spaces (215D).

215A The principle of exhaustion The following is an example of the use of one of the most impor-
tant methods in measure theory.
Lemma Let (X, , ) be any measure space and E a non-empty set such that supnN Fn is finite for
every non-decreasing sequence hFn inN in E.
44 Taxonomy of measure spaces 215A

(a) There is a non-decreasing sequence hFn inN in E such that, for S


every E , either there is an n N
such that E Fn is not included in any member of E or, setting F = nN Fn ,
limn (E \ Fn ) = (E \ F ) = 0.
In particular, if E E and E F , then E \ F is negligible.
(b)SIf E is upwards-directed, then there is a non-decreasing sequence hFn inN in E such that, setting
F = nN Fn , F = supEE E and E \ F is negligible for every E E, so that F is an essential supremum
of E in in the sense of 211G.
(c) If the union of any non-decreasing sequence in E belongs to E, then there is an F E such that E \ F
is negligible whenever E E and F E.
proof (a) Choose hFn inN , hEn inN and hun inN inductively, as follows. Take F0 to be any member of E.
Given Fn E, set En = {E : Fn E E} and un = sup{E : E En } in [0, ], and choose Fn+1 En
such that Fn+1 min(n, un 2n ); continue.
Observe that this construction yields a non-decreasing sequence hFn inN in E. Since En+1 En for every
n, hun inN is non-increasing, and has a limit u in [0, ]. Since min(n, u 2n ) Fn+1 un for every n,
limn Fn = u. Our hypothesis on E now tells us that u is finite.
If E is such that for every n N there is an En E such that E Fn En , then En En , so
Fn (E Fn ) En un
for every n, and limn (E Fn ) = u. But this means that
(E \ F ) limn (E \ Fn ) = limn (E Fn ) Fn = 0,
as stated. In particular, this is so if E E and E F .
(b) Take hFn inN from (a). If E E, then (because E is upwards-directed) E Fn is included in some
member of E for every n N; so we must have the second alternative of (a), and E \ F is negligible. It
follows that
supEE E F = limn Fn supEE E,
so F = supEE E.
If G is any measurable set such that E \ F is negligible for every E E, then Fn \ G is negligible for
every n, so that F \ G is negligible; thus F is an essential supremum for E.
S
(c) Again take hFn inN from (a), and set F = nN En . Our hypothesis now is that F E, so has both
the properties declared.

215B -finite spaces are so important that I think it is worth spelling out the following facts.
Proposition Let (X, , ) be a semi-finite measure space. Write N for the family of -negligible sets and
f for the family of measurable sets of finite measure. Then the following are equiveridical:
(i) (X, , ) is -finite;
(ii) every disjoint family in f \ N is countable;
(iii) every disjoint family in \ N is countable; S
(iv) for every E there is a countable set E0 E such that E \ E0 is negligible for every E E;
(v) forSevery non-empty upwards-directed E there is a non-decreasing sequence hFn inN in E such
that E \ nN Fn is negligible for every E E; S
(vi) for every non-empty E , there is a non-decreasing sequence hFn inN in E such that E \ nN Fn
is negligible whenever E E and E Fn for every n N;
(vii) either X = 0 or there is a probability measure on X with the same domain and the same negligible
sets as ;
(viii) there is a measurable integrable function f : X ]0, 1]; R
(ix) either X = 0 or there is a measurable function f : X ]0, [ such that f d = 1.
proof (i)(vii) and (viii) If X = 0, (vii) is trivial and we can take f = X in (viii). Otherwise, let
hEn inN be a disjoint sequence in f covering
P X. Then it is easy to see
Pthat there is a sequence hn inN of
strictly positive real numbers such that n=0 n En = 1. Set E = n=0 n (E En ) for E ; then
215C -finite spaces and the principle of exhaustion 45

P
is a probability measure with domain and the same negligible sets as . Also f = n=0 min(1, n )En
is a strictly positive measurable integrable function.
(vii)(vi) and (v) Assume (vii), and let E be a non-empty family of measurable sets. If X = 0
then (vi) and (v) are certainly true. Otherwise, let be a probability measure with domain and the
S E 1 is finite, we can apply 215Aa to find
same negligible sets as . Since supEE S a non-decreasing
sequence hFn inN in ESsuch that E \ nN Fn is negligible whenever E E includes nN Fn ; and if E is
upwards-directed, E \ nN Fn will be negligible for every E E, as in 215Ab.
(vi)(iv) Assume (vi), and let E be any subset of . Set
S
H = { E0 : E0 E is countable}.
S
By (vi), there is a sequence hHn inN in H such thatSH \ nN Hn is negligible whenever H H andS H Hn
0 0 0
for every n N. Now we can express S each H n as
S E n , where En E is countable; setting E0 = nN En ,
E0 isScountable.SIf E E, then E nN Hn = ({E} E0 ) belongs to H and includes every Hn , so that
E \ E0 = E \ nN Hn is negligible. So E0 has the property we need, and (iv) is true.
S
(iv)(iii) Assume (iv). If E is a disjointSfamily in \ N , take a countable E0 E such that E \ E0 is
negligible for every E E. Then E = E \ E0 is negligible for every E E \ E0 ; but this just means that
E \ E0 is empty, so that E = E0 is countable.
(iii)(ii) is trivial.
(ii)(i) Assume (ii). Let P be the set of all disjoint subsets of f \ N , ordered by . Then P is a
partially ordered set, not empty S(as P), and if Q P is non-empty and totally ordered then it has an
upper bound in P. P P Set E = Q, the union of all the disjoint families belonging to Q. If E E then
E C for some C Q, so E f \ N . If E, F E and E 6= F , then there are C, D Q such that E C,
F D; now Q is totally ordered, so one of C, D is larger than the other, and in either case C D is a member
of Q containing both E and F . But since any member of Q is a disjoint collection of sets, E F = . As E
and F are arbitrary, E is a disjoint family of sets and belongs to P. And of course C E for every C Q,
so E is an upper bound for Q in P. Q Q S
By Zorns Lemma
S (2A1M), P has a maximal element E say. By (ii), E must be countable, so E .
Now H = X \ E is negligible. P P?? Suppose, if possible, otherwise. Because (X, , ) is semi-finite, there is
a set G of finite measure such that G H and G > 0, that is, G f \ N and G E = for every E E.
But this means that {G} E is a member of P strictly larger than E, which is supposed to be impossible.
XXQQ
Let hXn inN be a sequence running over E {H}. Then hXn inN is a cover of X by a sequence of
measurable sets of finite measure, so (X, , ) is -finite.
S
(v)(i) If (v) is true, then we have a sequence hEn inN in f such that E \ nN En is negligible for
S
every E f . Because is semi-finite, X \ nN En must be negligible, so X is covered by a countable
family of sets of finite measure and is -finite.
(viii)(ix) If X = 0 this is trivial. Otherwise, if f is a strictly positive measurable integrable function,
R 1
then c = f > 0 (122Rc), so f is a strictly positive measurable function with integral 1.
c
(ix)(i) If f : X ]0, [ is measurable and integrable, h{x : f (x) 2n }inN is a sequence of sets of
finite measure covering X.

215C Corollary Let (X, , ) be a -finite measure space, and suppose that E is any non-empty
set.
(a) There is a non-decreasing sequence hFn inN in E such that,
S for every E , either there is an n N
such that E Fn is not included in any member of E or E \ nN Fn is negligible. S
(b) If E is upwards-directed, then there is a non-decreasing sequence hFn inN in E such that nN Fn is
an essential supremum of E in .
(c) If the union of any non-decreasing sequence in E belongs to E, then there is an F E such that E \ F
is negligible whenever E E and F E.
proof By 215B, there is a totally finite measure on X with the same measurable sets and the same
negligible sets as . Since supEE E is finite, we can apply 215A to to obtain the results.
46 Taxonomy of measure spaces 215D

215D As a further example of the use of the principle of exhaustion, I give a fundamental fact about
atomless measure spaces.
Proposition Let (X, , ) be an atomless measure space. If E and 0 E < , there is an
F such that F E and F = .
proof (a) We need to know that if G is non-negligible and n N, then there is an H G such that
0 < H 2n G. P P Induce on n. For n = 0 this is trivial. For the inductive step to n + 1, use the
inductive hypothesis to find H G such that 0 < H 2n G. Because is atomless, there is an H 0 H
such that H 0 , (H \ H 0 ) are both defined and non-zero. Now at least one of them has measure less than
or equal to 12 H, so gives us a subset of G of non-zero measure less than or equal to 2n1 G. QQ
It follows that if G has non-zero finite measure and > 0, there is a measurable set H G such that
0 < H .
(b) Let H be the family of all Sthose H such that H E andS H . If hHn inN is any non-
decreasing sequence in H, then ( nN Hn ) = limn Hn , so nN Hn H. So 215Ac tells us that
there is an F H such that H \ F is negligible whenever H H and F H. ?? Suppose, if possible, that
F < . By (a), there is an H E \ F such that 0 < H F . But in this case H F H and
((H F ) \ F ) > 0, which is impossible. X
X
So we have found an appropriate set F .

215X Basic exercises (a) Let (X, , ) be a measure R space and a non-empty set of -integrable
real-valued functions from X to R. Suppose that supnN fn is finite for every sequence hfn inN in such
that fn a.e. fn+1 for every n. Show that there is a sequence hfn inN in such that fn a.e. fn+1 for every
n and, for every integrable real-valued function f on X, either f a.e. supnN fn or there is an n N such
that no member of is greater than or equal to max(f, fn ) almost everywhere.

> (b) Let (X, , ) be a measure space. (i) Suppose S that E is a non-empty upwards-directed subset of
such that c = supEE E is finite. Show that E \ nN Fn is negligible whenever E E and hFn inN
is a sequence in E such that limn Fn = c. (ii) Let be a non-empty set of integrable functions on X
which is upwards-directed in the
R sense that for all f , g there is an h such that max(f, g) a.e. h,
and suppose that c = supf fR is finite. Show that f a.e. supnN fn whenever f and hfn inN is a
sequence in such that limn fn = c.

(c) Use 215A to shorten the proof of 211Ld.

(d) Give an example of a (non-semi-finite) measure space (X, , ) satisfying conditions (ii)-(iv) of 215B,
but not (i).

> (e) Let (X, , ) be an atomless -finite measure space. Show that forSany > 0 there is a disjoint
sequence hEn inN of measurable sets with measure at most such that X = nN En .

(f ) Let (X, , ) be an atomless strictly localizable measure space. Show that for any > 0 there is a
decomposition hXi iiI of X such that Xi for every i I.

215Y Further exercises (a) Let (X, , ) be a -finite measure space and hfmn im,nN , hfm imN , f
measurable real-valued functions defined almost everywhere on X and such that hfmn inN fm a.e. for
each m and hfm imN f a.e. Show that there is a strictly increasing sequence hnm imN in N such that
hfm,nm imN f a.e. (Compare 134Yb.)

(b) Let (X, , ) be a -finite measure space. Let hfn inN be a sequence of measurable real-valued
functions such that f = limn fn is defined almost everywhere
S on X. Show that there is a non-decreasing
sequence hXk ikN of measurable subsets of X such that kN Xk is conegligible in X and hfn inN f
uniformly on every Xk , in the sense that for any > 0 there is an m N such that |fj (x) f (x)| is defined
and less than or equal to whenever j m, x Xk .
(This is a version of Egorovs theorem.)
216 intro. Examples 47

(c) Let (X, , ) be a totally finite measure space and hfn inN , f measurable real-valued functions defined
almost everywhere on X. Show that hfn inN f a.e. iff there is a sequence hn inN of strictly positive real
numbers, converging to 0, such that
S
limn ( kn {x : x dom fk dom f, |fk (x) f (x)| n }) = 0.

(d) Find a direct proof of (v)(vi) in 215B. (Hint: given E , use Zorns Lemma to find a maximal
totally ordered E 0 E such that E4F / N for any distinct E, F E 0 , and apply (v) to E 0 .)

215 Notes and comments The common ground of 215A, 215B(vi), 215C and 215Xa is actually one of the
most fundamental ideas in measure theory. It appears in such various forms that it is often easier to prove
an application from first principles than to explain how it can be reduced to the versions here. But I will
try henceforth to signal such applications as they arise, calling the method (the proof of 215Aa or 215Xa)
the principle of exhaustion. One point which is perhaps worth noting here is the inductive construction
of the sequence hFn inN in the proof of 215Aa. Each Fn+1 is chosen after the preceding one. It is this
which makes it possible, in the proof of 215B(vii)(vi), to extract a suitable sequence hFn inN directly.
In many applications (starting with what is surely the most important one in the elementary theory, the
Radon-Nikodym theorem of 232, or with part (i) of the proof of 211Ld), this refinement is not needed;
we are dealing with an upwards-directed set, as in 215B(v), and can choose the whole sequence hFn inN at
once, no term interacting with any other, as in 215Xb. The axiom of dependent choice, which asserts that
we can construct sequences term-by-term, is known to be stronger than the axiom of countable choice,
which asserts only that we can choose countably many objects simultaneously.
In 215B I try to indicate the most characteristic properties of -finiteness; in particular, the properties
which distinguish -finite measures from other strictly localizable measures. This result is in a way more
abstract than the manipulations in the rest of the section. Note that it makes an essential use of the axiom
of choice in the form of Zorns Lemma. I spent a paragraph in 134C commenting on the distinction between
countable choice, which is needed for anything which looks like the standard theory of Lebesgue measure,
and the full axiom of choice, which is relatively little used in the elementary theory. The implication (ii)(i)
of 215B is one of the points where we do need something beyond countable choice. (I should perhaps remark
that the whole theory of non--finite measure spaces looks very odd without the general axiom of choice.)
Note also that in 215B the proofs of (i)(vii) and (vii)(vi) are the only points where anything so vulgar
as a number appears. The conditions (iii), (iv), (v) and (vi) are linked in ways that have nothing to do with
measure theory, and involve only with the structure (X, , N ). (See 215Yd here, and 316D-316E in Volume
3.) There are similar conditions relating to measurable functions rather than measurable sets; for a fairly
abstract example, see 241Yd.
In 215Ya-215Yc are three more standard theorems on almost-everywhere-convergent sequences which
depend on - or total finiteness.

216 Examples
It is common practice and, in my view, good practice in books on pure mathematics, to provide
discriminating examples; I mean that whenever we are given a list of new concepts, we expect to be provided
with examples to show that we have a fair picture of the relationships between them, and in particular that
we are not being kept ignorant of some startling implication. Concerning the concepts listed in 211A-211K,
we have ten different properties which some, but not all, measure spaces possess, giving a conceivable total
of 210 different types of measure space, classified according to which of these ten properties they have. The
list of basic relationships in 211L reduces these 1024 possibilities to 72. Observing that a space can be
simultaneously atomless and purely atomic only when the measure of the whole space is 0, we find ourselves
with 56 possibilities, being two trivial cases with X = 0 (because such a measure may or may not be
complete) together with 9 2 3 cases, corresponding to the nine classes
probability spaces,
spaces which are totally finite, but not probability spaces,
48 Taxonomy of measure spaces 216 intro.

spaces which are -finite, but not totally finite,


spaces which are strictly localizable, but not -finite,
spaces which are localizable and locally determined, but not strictly localizable,
spaces which are localizable, but not locally determined,
spaces which are locally determined, but not localizable,
spaces which are semi-finite, but neither locally determined nor localizable,
spaces which are not semi-finite;
the two classes
spaces which are complete,
spaces which are not complete;
and the three classes
spaces which are atomless, not of measure 0,
spaces which are purely atomic, not of measure 0,
spaces which are neither atomless nor purely atomic.
I do not propose to give a complete set of fifty-six examples, particularly as rather fewer than fifty-six
different ideas are required. However, I do think that for a proper understanding of abstract measure spaces
it is necessary to have seen realizations of some of the critical combinations of properties. I therefore take
a few paragraphs to describe three special examples to add to those of 211M-211R.

216A Lebesgue measure Before turning to the new ideas, let me mention Lebesgue measure again.
As remarked in 211M, 211P and 211Q,
(a) Lebesgue measure on R is complete, atomless and and -finite, therefore strictly localizable, local-
izable and locally determined.
(b) The subspace measure [0,1] on [0, 1] is a complete, atomless probability measure.
(c) The restriction B of to the algebra B of Borel sets in R is atomless, -finite and not complete.

216B I now embark on the description of three counter-examples; meaning spaces built specifically
for the purpose of showing that there are no unexpected implications among the ten properties under
consideration here. Even by the standards of this chapter these must be regarded as dispensable by the
student who wants to get on with the real business of understanding the big theorems of the subject. Neither
the existence of these examples, nor the techniques needed in constructing them, are vital for anything else
we shall look at before Volume 5. But if you are going to take abstract measure theory seriously at all,
sooner or later you will need to form some kind of mental picture of the nature of the spaces possessing
the different properties here, and a minimal requirement of such a picture is that it should include the
discriminations witnessed by these examples.

*216C A complete, localizable, non-locally-determined space The first example hardly needs
an idea beyond what we already have, but it does call for more manipulations than it seems fair to set as
an exercise, and may therefore be useful as a demonstration of technique.
(a) Let I be any uncountable set, and set X = {0, 1} I. For E X, y {0, 1} set E[{y}] = {i : (y, i)
E} I. Set
= {E : E X, E[{0}]4E[{1}] is countable}.
Then is a -subalgebra of subsets of X. P
P (i) [{0}]4[{1}] = is countable, so . (ii) If E
then
(X \ E)[{0}]4(X \ E)[{1}] = E[{0}]4E[{1}]
S
is countable. (iii) If hEn inN is a sequence in and E = nN En , then
S
E[{0}]4E[{1}] nN En [{0}]4En [{1}]
is countable. Q
Q
For E , set E = #(E[{0}]) if this is finite, otherwise; then (X, , ) is a measure space.
(b) (X, , ) is complete. P
P If A E and E = 0, then (0, i)
/ E for every i. So
*216D Examples 49

A[{0}]4A[{1}] = A[{1}] E[{1}] = E[{1}]4E[{0}]


must be countable, and A . Q
Q
(c) (X, , ) is semi-finite. P
P If E and E > 0, there is an i I such that (0, i) E; now
F = {(0, i)} E and F = 1. Q Q
(d) (X, , ) is localizable. P
P Let E be any subset of . Set
S
J = EE E[{0}], G = {0, 1} J.
Then G . If H , then

(E \ H) = 0 for every E E
E[{0}] H[{0}] for every E E
(0, i) H for every i J
(G \ H) = 0.
Thus G is an essential supremum for E in ; as E is arbitrary, is localizable. Q
Q
(e) (X, , ) is not locally determined. P
P Consider H = {0}I. Then H
/ because H[{0}]4H[{1}] =
I is uncountable. But let E be any set such that E < . Then
(E H)[{0}]4(E H)[{1}] = (E H)[{0}] E[{0}]
is finite, so E H . As E is arbitrary, H witnesses that is not locally determined. Q
Q
(f ) (X, , ) is purely atomic. P
P Let E be any set of non-zero measure. Let i I be such that
(0, i) E. Then (0, i) E and F = {(0, i)} is a set of measure 1, included in E; because F is a singleton
set, it must be an atom for ; as E is arbitrary, is purely atomic. Q
Q
(g) Thus the construction here yields a complete, localizable, purely atomic, non-locally-determined
space.

*216D A complete, locally determined space which is not localizable The next construction
requires a little set theory. We need two sets I, J such that I is uncountable (more strictly, I cannot S be
expressed as the union of countably many countable sets), I J and J cannot be expressed as iI Ki
where every Ki is countable. The most natural way of doing this, subject to the axiom of choice, is to take
I = 1 , the first uncountable ordinal, and J to be 2 , the first ordinal from which there is no injection into
1 (see 2A1Fc); but in case you prefer other formulations (e.g., I = {{x} : x R} and J = PR), I will write
the following argument in terms of I and J, and you can pick your own pair.
(a) Let T be the countable-cocountable -algebra of J and the countable-cocountable measure on J
(211R). Set X = J J and for E X set
E[{}] = { : (, ) E}, E 1 [{}] = { : (, ) E}
for every J. Set
= {E : E[{}] and E 1 [{}] belong to T for every J},
P P
E = J E[{}] + J E 1 [{}]
for every E . It is easy to check that is a -algebra and that is a measure.
(b) (X, , ) is complete. PP If A E and E = 0, then all the sets E[{}], E 1 [{}] are countable,
so the same is true of all the sets A[{}], A1 [{}], and A . Q
Q
(d) (X, , ) is semi-finite. P
P For each J, set
G = {} J, G = J {}.
Then all the sections G [{}], G1 1
[{}], G [{}], G [{}] are either J or or {}, so belong to T, and all
the G , G belong to , with -measure 1.
50 Taxonomy of measure spaces *216D

Suppose that E is a set of strictly positive measure. Then there must be some J such that
0 < E[{}] + E 1 [{}] = (E G ) + (E G ) < ,
and one of the sets E G , E G is a set of non-zero finite measure included in E. Q
Q
(e) (X, , ) is locally determined. P
P Suppose that H X is such that H E whenever E and
E < . Then, in particular, H G and H G belong to , so
H[{}] = (H G )[{}] T,

H 1 [{}] = (H G )1 [{}] T,
for every J. This shows that H . As H is arbitrary, is locally determined. Q
Q
(f ) (X, , ) is not localizable. P
P Set E = {G : J}. ?? Suppose, if possible, that G is an
essential supremum for E. Then
(J \ G[{}]) = (G \ G) = 0
S
and J \ G[{}] is countable, for every J. Consequently J 6= I (J \ G[{}]), and there is an belonging
S T
to J \ I (J \ G[{}]) = I G[{}]. This means just that (, ) G for every I, that is, that
I G1 [{}]. Accordingly G1 [{}] is uncountable, so that G1 [{}] = (G G ) = 1. But observe that
(G G ) = {(, )} = 0 for every J. This means that, setting H = X \ G , E \ H is negligible, for
every E E; so that we must have 0 = (G \ H) = (G G ) = 1, which is absurd. X X
Thus E has no essential supremum in , and cannot be localizable. Q Q
(g) (X, , ) is purely atomic. P P If E has non-zero measure, there must be some J such that
one of E[{}], E 1 [{}] is not countable; that is, such that one of E G , E G is not negligible. But if
now H E G , either H[{}] is countable, and H = 0, or J \ H[{}] is countable, and (G \ H) = 0;
similarly, if H E G , one of H, (G \ H) must be 0, according to whether H 1 [{}] is countable or
not. Thus E G , E G , if not negligible, must be atoms, and E must include an atom. As E is arbitrary,
is purely atomic. QQ
(h) Thus (X, , ) is complete, locally determined and purely atomic, but is not localizable.

*216E A complete, locally determined, localizable space which is not strictly localizable For
the last, and most interesting, construction, we need a non-trivial result in infinitary combinatorics, which
I have written out in 2A1P: if I is any set, and hf iA is a family in {0, 1}I , the set of functions from I
to {0, 1}, with #(A) strictly greater than c, the cardinal of the continuum, and if hK iA is any family of
countable subsets of I, then there must be distinct , A such that f and f agree on K K .
Armed with this fact, I proceed as follows.
(a) Let C be any set of cardinal greater than c. Set I = PC, the set of subsets of C, and write X = {0, 1}I .
For C, define f X by saying that f () = 1 if C and f () = 0 if / C. Let K be the
family of countable subsets of I, and for K K, C set
FK = {x : x X, xK = f K} X.
Let

= {E : E X, either there is a K K such that FK E


or there is a K K such that FK X \ E}.
Then is a -algebra of subsets of X. P
P (i) F X \ so . (ii) The definition of is symmetric
between E and X \ E, so X \ E whenever E . (iii) Let hEn inN be a sequence in , with union
E. () If there are n N, K K such that FK En , then FS K E, so E . () Otherwise, there is
for each n N a Kn K such that F,Kn X \ En . Set K = nN Kn K. Then

FK = {x : xK = f K} = {x : xKn = f Kn for every n N}


\ \
= F,Kn X \ En = X \ E,
nN nN
*216E Examples 51

so again E . As hEn inN is arbitrary, is a -algebra. Q


Q
(b) Set
T
= C ;
then , being an intersection of -algebras, is a -algebra of subsets of X (see 111Ga). Define : [0, ]
by setting

E = #({ : f E}) if this is finite,


= otherwise;
then is a measure.
(c) It will be convenient later to know something about the sets
GD = {x : x X, x(D) = 1}
for D C. In particular, every GD belongs to . P
P If D, then f (D) = 1 so GD = F,{D} . If
C \ D, then f (D) = 0 so GD = X \ F,{D} . Q
Q Also, of course, { : f GD } = D.
(d) (X, , ) is complete. P
P Suppose that A E and that E = 0. For every C, E and
f
/ E, so FK 6 E for any K K and there is a K K such that
FK X \ E X \ A.
Thus A ; as is arbitrary, A . As A is arbitrary, is complete. Q
Q
(e) (X, , ) is semi-finite. P P Let E be a set of positive measure. Then there must be some
C such that f E. Consider E 0 = E G{} . As f E 0 , E 0 1 > 0. On the other hand,
G{} = #({ : {}}) = 1, so E 0 = 1. As E is arbitrary, is semi-finite. Q
Q
S
(f ) (X, , ) is localizable. P
P Let E be any subset of . Set D = { : C, f E}. Consider GD .
For H ,

(E \ H) = 0 for every E E
f
/ E \ H for every E E, C
f H for every D
f
/ GD \ H for every C
(GD \ H) = 0.
Thus GD is an essential supremum for E in . As E is arbitrary, is localizable. Q
Q
(g) (X, , ) is not strictly localizable. P
P?? Suppose, if possible, that hXj ijJ is a decomposition of
(X, , ). Set J 0 = {j : j J, Xj > 0}. For each j J 0 , the set Cj = { : f Xj } must be finite and
non-empty. Moreover, for each SC, there must be some j J such that (G{} Xj ) > 0, and in this
case j J 0 and Cj . Thus C = jJ 0 Cj . Because #(C) > c, #(J 0 ) > c (2A1Ld).
For each j J 0 , choose j Cj . Then
fj Xj j ,
so there must be a Kj K such that Fj ,Kj Xj .
At this point I finally turn to the result cited at the start of this example. Because #(J 0 ) > c, there must
be distinct j, k J 0 such that fj and fk agree on Kj Kk . We may therefore define x X by saying that

x() = fj () if Kj ,
= fk () if Kk ,
= 0 if C \ (Kj Kj ).
Now
x Fj ,Kj Fk ,Kk Xj Xk ,
52 Taxonomy of measure spaces *216E

and Xj Xk 6= ; contradicting the assumption that the Xj formed a decomposition of X. X


XQQ

(h) (X, , ) is purely atomic. P


P If E and E > 0, then (as remarked in (e) above) there is a C
such that (E G{} ) = 1; now E G{} must be an atom. Q Q

(i) Accordingly (X, , ) is a complete, locally determined, localizable, purely atomic measure space
which is not strictly localizable.

216X Basic exercises (a) In the construction of 216C, show that the subspace measure on {1} I is
not semi-finite.

(b) Suppose, in 216D, that I = 1 . (i) Show that the set {(, ) : < 1 } is measurable for the
measure constructed by Caratheodorys method from P(I I), but not for the subspace measure on
I I. (ii) Hence, or otherwise, show that the subspace measure on I I is not locally determined.

(c) In 216Ya, 252Yr and 252Yt below, I indicate how to construct atomless versions of 216C, 216D and
216E, that is, atomless complete measure spaces of which the first is localizable but not locally determined,
the second is locally determined spaces but not localizable, and the third is locally determined and localizable
but not strictly localizable. Show how direct sums of these, together with counting measure and the examples
described in this chapter, can be assembled to provide all 56 examples called for by the discussion in the
introduction to this section.

216Y Further exercises (a) Let be Lebesgue measure on [0, 1], and its domain. Set Y = [0, 1]
{0, 1} and write
T = {F : F Y, F 1 [{0}] },

F = {z : (z, 0) F } for every F T.


Set
T0 = {F : F T, F 1 [{0}]4F 1 [{1}] is -negligible}.
Let I be an uncountable set. Set X = Y I,
= {E : E X, E 1 [{i}] T for every i I, {i : E 1 [{i}]
/ T0 } is countable},
P
E = iI E 1 [{i}] for E .
(i) Show that (Y, T, ) and (Y, T0 , T0 ) are complete probability spaces, and that for every F T there is
an F 0 T0 such that (F 4F 0 ) = 0. (ii) Show that (X, , ) is an atomless complete localizable measure
space which is not locally determined.

(b) Define a measure on X = 2 2 as follows. Take to be the -algebra of subsets of X generated


by
{A 2 : A 2 } {2 : < 2 }.
For E set
W (E) = { : < 2 , sup E[{}] = 2 },
and set E = #(W (E)) if this is finite, 0 otherwise. Show that is a measure on X, is localizable and
locally determined, but does not have locally determined negligible sets. Find a subspace Y of X such that
the subspace measure on Y is not semi-finite.

(c) Show that in the space described in 216E every set has a measurable envelope, but that this is not
true in the spaces of 216C and 216D.
216 Notes Examples 53

216 Notes and comments The examples 216C-216E are designed to form, with Lebesgue measure, a basis
for constructing a complete set of examples for the concepts listed in 211A-211K. One does not really expect
to encounter these phenomena in applications, but a clear understanding of the possibilities demonstrated
by these examples is part of a proper appreciation of their rarity. Of course, if we add further properties
to our list for instance, the property of having locally determined negligible sets (213I), or the property
that every subset should have a measurable envelope (213Xl) then there are further positive results to
complement 211L, and more examples to hunt for, like 216Yb. But it is time, perhaps past time, that we
returned to the classical theorems which apply to the measure spaces at the centre of the subject.
54 The fundamental theorem of calculus

Chapter 22
The fundamental theorem of calculus
In this chapter I address one of the most important properties of the Lebesgue Rx integral. Given an
integrable function f : [a, b] R, we can form its indefinite integral F (x) = a f (t)dt for x [a, b]. Two
questions immediately present themselves. (i) Can we expect to have the derivative F 0 of F equal to f ?
(ii) Can we identify which functions F will appear as indefinite integrals? Reasonably satisfactory answers
may be found for both of these questions: F 0 = f almost everywhere (222E) and indefinite integrals are the
absolutely continuous functions (225E). In the course of dealing with them, we need to develop a variety
of techniques which lead to many striking results both in the theory of Lebesgue measure and in other,
apparently unrelated, topics in real analysis.
The first step is Vitalis theorem (221), a remarkable argument it is more a method than a theorem
which uses the geometric nature of the real line to extract disjoint subfamilies from collections of intervals.
It is the foundation stone not only of the results in 222 but of all geometric measure theory, that is, measure
theory on spaces with a geometric structure. I use it here to show that monotonic functions are differentiable
almost everywhere (222A). Following this, Fatous Lemma and Lebesgues Dominated Convergence Theorem
are enough to show that the derivative of an indefinite integral is almost everywhere equal to the integrand.
We now find that some innocent-looking manipulations of this fact take us surprisingly far; I present these
in 223.
I begin the second half of the chapter with a discussion of functions of bounded variation, that is, ex-
pressible as the difference of bounded monotonic functions (224). This is one of the least measure-theoretic
sections in the volume; only in 224I and 224J are measure and integration even mentioned. But this material
is needed for Chapter 28 as well as for the next section, and is also one of the basic topics of twentieth-century
real analysis. 225 deals with the characterization of indefinite integrals as the absolutely continuous func-
tions. In fact this is now quite easy; it helps to call on Vitalis theorem again, but everything else is a
straightforward application of methods previously used. The second half of the section introduces some new
ideas in an attempt to give a deeper intuition into the essential nature of absolutely continuous functions.
226 returns to functions of bounded variation and their decomposition into saltus and absolutely contin-
uous and singular parts, the first two being relatively manageable and the last looking something like the
Cantor function.

221 Vitalis theorem in R


I give the first theorem of this chapter a section to itself. It occupies a position between measure theory
and geometry (it is, indeed, one of the fundamental results of geometric measure theory), and its proof
involves both the measure and the geometry of the real line.

221A Vitalis theorem Let A be a bounded subset of R and I a family of non-singleton closed intervals
in R such that every point of A belongs to arbitrarily short members of I. Then thereSis a countable set
I0 I such that (i) I0 is disjoint, that is, I I 0 = for all distinct I, I 0 I0 (ii) (A \ I0 ) = 0, where
is Lebesgue measure on R.
S
proof (a) If there is a finite disjoint set I0 I such that A I0 (including the possibility that A = I0 =
), we can stop. So let us suppose henceforth that there is no such I0 .
Let be Lebesgue outer measure on R. Suppose that |x| < M for every x A, and set
I 0 = {I : I I, I [M, M ]}.
0 0
(b) In this case, if I0 is any
S finite disjoint subset of I , there is a J I which is disjoint from any
member of I0 . P P Take x A \ I0 . Now there is a > 0 such that [x , x + ] does not meet any member
of I0 , and as |x| < M we can suppose that [x S , x + ] [M, M ]. Let J be a member of I, containing
x, and of length at most ; then J I 0 and J I0 = . Q Q
221B Vitalis theorem in R 55

(c) We can now choose a sequence hn inN of real numbers and a disjoint sequence hIn inN in I 0 induc-
tively, as follows. Given hIj ij<n (if n = 0, this is the empty sequence, with no members), with Ij I 0 for
each j < n, and Ij Ik = for j < k < n, set
Jn = {I : I I 0 , I Ij = for every j < n}.
We know from (b) that Jn 6= . Set
n = sup{I : I Jn };
then 0 < n 2M . We may therefore choose a set In Jn such that In 21 n , and this continues the
induction.
(e) Because the In are disjoint Lebesgue measurable subsets of [M, M ], we have
P P
n=0 n 2 n=0 In 4M < ,

and limn n = 0. Now define In0 to be the closed interval with the same midpoint as In but five times
the length, so that it projects past each end of In by at least n . I claim that, for any n,
S S
A j<n Ij jn Ij0 .
S S
P?? Suppose, if possible, otherwise. Take any x belonging to A \ ( j<n Ij jn Ij0 ). Let > 0 be such
P
that
S
[x , x + ] [M, M ] \ j<n Ij ,
and let J I be such that
x J [x , x + ].
Then
J > 0 = limm m ;
let m be the least integer greater than or equal to n such that m < J. In this case J cannot belong to
Jm , so there must be some k < m such that J Ik 6= , because certainly J I 0 . By the choice of , k
cannot be less than n, so n k < m, and k J. In this case, the distance from x to the nearest endpoint
of Ik is at most J k . But the ends of Ik0 project beyond the ends of Ik by at least k , so x Ik0 ; which
contradicts the choice of x. X
XQQ
(f ) It follows that
S S P P
(A \ j<n Ij ) ( jn Ij0 ) j=n Ij0 5 j=n Ij .
As
P
j=0 Ij 2M < ,
we must have
S
limn (A \ j<n Ij ) = 0,
and
S S S
(A \ jN Ij ) = (A \ jN Ij ) inf nN (A \ j<n Ij ) = 0.
Thus
S in this case we may set I0 = {In : n N} to obtain a countable disjoint family in I with
(A \ I0 ) = 0.

221B Remarks (a) I have expressed this theorem in the form there is a countable set I0 I such
that . . . in an attempt to find a concise way of expressing the three possibilities
(i) A = I = , so that we must take I0 = ;
(ii) there are disjoint I0 , . . . , In I such that A I0 . . . In , so that we can take I0 =
{I0 , . . . , In };
S
(iii) there is a disjoint sequence hIn inN in I such that (A \ nN In ) = 0, so that we can
take I0 = {In : n N}.
Of course many applications, like the proof of 221A itself, will use forms of these three alternatives.
56 The fundamental theorem of calculus 221Bb

(b) The actual theorem here, as stated, will be used in the next section. But quite as important as the
statement of the theorem is the principle of its proof. The In are chosen greedily, that is, when we come
to choose In we look at the family Jn of possible intervals, given the choices I0 , . . . , In1 already made,
and choose an In Jn which is about as big as it could be. The supremum of the possibilities for In
is n ; but since we do not know that there is any I Jn such that I = n , we must settle for a little
less. I follow the standard formula in taking In 12 n , but of course I could have taken In 100 99
n , or
n
In (1 2 )n , if that had helped later on. The remarkable thing is that this works; we can choose the
In without foresight and without considering their interrelationships (for that matter, without examining
the set A) beyond the minimal requirement that In Ij = for j < n, and even this arbitrary and casual
procedure yields a suitable sequence.

(c) I have stated the theorem in terms of bounded sets A and closed intervals, which is adequate for
our needs, but very small changes in the proof suffice to deal with arbitrary (non-singleton) intervals, and
another refinement handles unbounded sets A. (See 221Ya.)

221X Basic exercises (a) Let ]0, 1[. Suppose, in part (c) of the proof of 221A, we take In n
for each n N, rather than In 12 n . What will be the appropriate constant to take in place of 5 in
defining the sets Ij0 of part (e)?

221Y Further exercises (a) Let A be a subset of R and I a family of non-singleton intervals in
R such that every point of A belongsS to arbitrarily short members of I. Show that there is a countable
disjoint set I0 I such that A \ I0 is Lebesgue negligible. (Hint: apply 221A to the sets A ]n, n + 1[,
{I : I I, I ]n, n + 1[}, writing I for the closed interval with the same endpoints as I.)
S
(b) Let J be any
S family of non-singleton intervals in R. Show that J is Lebesgue measurable. (Hint:
apply (a) to A = J and the family I of non-singleton subintervals of members of J .)

(c) Let (X, ) be a metric space, A a subset of X, and I a family of closed balls of non-zero radius in X
such that every point of A belongs to arbitrarily small members of I. (I say here that a set is a closed ball
of non-zero radius if it is expressible in the form B(x, ) = {y : (y, x) } where x X and > 0. Of
course it is possible for such a ball to be a singleton {x}.) Show that either A can be covered by a finite
disjoint family in I or there is a disjoint sequence hB(xn , n )inN in I such that
S S
A mn B(xm , m ) m>n B(xm , 5m ) for every n N
or there is a disjoint sequence hB(xn , n )inN in I such that inf nN n > 0.

(d) Give an example of a family I of open intervals such that every point of R belongs to arbitrarily
small members of I, but if hIn inN is any disjoint sequence in I, and for each n N we write In0 for
the closed
S interval S
with the same centre as In and ten times the length, then there is an n such that
]0, 1[ 6 m<n Im nm In0 .
S S
(e) (i) Show that if I is a finite family of intervals in R there are I0 , I1 I such that (I0 I1 ) = I
and both I0 and I1 are disjoint families. (Hint: induce on #(I).) (ii) Suppose that I is a family of non-
singleton intervals, of length at most 1, coveringSa bounded set A R, and that > 0. Show that there
is a disjoint subfamily I0 of I such that (A \ I0 ) 21 A + . (Hint: replacing each member of I by
a slightly longer one with rational endpoints, reduce to the case in which I is countable and thence to the
case in which I is finite; now use (i).) (iii) Use (ii) to prove Vitalis theorem. (I learnt this argument from
J.Aldaz.)

221 Notes and comments I have headed this section Vitalis theorem in R because there is an r-
dimensional version, which will appear in Chapter 26 below. There is an anomaly in the position of this
theorem. It is an indispensable element of the proofs of some of the most important theorems in measure
theory; on the other hand, the ideas involved in its own proof are not used elsewhere in the elementary
theory. I have therefore myself sometimes omitted the proof when teaching this material, and would not
222A Differentiating an indefinite integral 57

reproach any student who left it to one side for the moment. At some stage, of course, any measure theorist
must master the method, not just for the sake of completeness, but in order to gain an intuition for possible
variations. I must emphasize that it is the principle of the proof, rather than its details, which is important,
because there are inuumerable forms of Vitalis theorem. (I offer some variations in the exercises above
and in 261 below, and there are many others which are important in more advanced work.) This principle
is, I suppose, that
(i) we choose the In greedily, according to some more or less natural criterion applicable to
each In as we come to choose it, without attempting to look ahead;
(ii) we prove that their sizes tend to zero, even though we seemed to do nothing to ensure that
they would (but note the shift from I to I 0 in part (a) of the proof of 221A, which is exactly
what is needed to make this step work);
S
(iii) we check that for a suitable definition of In0 , enlarging In , we shall have A m<n Im
S 0
P 0
mn Im for every n, while n=0 In < .
In a way, we have to count ourselves lucky every time this works. The reason for studying as many variations
as possible of a technique of this kind is to learn to guess when there is a sporting chance of being lucky.

222 Differentiating an indefinite integral


I come now to the first of the two questions
Rx mentioned in the introduction to this chapter: if f is an
d
integrable function on [a, b], what is dx a
f ? It turns out that this derivative exists and is equal to f
almost everywhere (222E). The argument is based on a striking property of monotonic functions: they are
differentiable almost everywhere (222A), and we can bound the integrals of their derivatives (222C).

222A Theorem Let I R be an interval and f : I R a monotonic function. Then f is differentiable


almost everywhere on I.
Remark If I seem to be speaking of a measure on R without naming it, as here, I mean Lebesgue measure.
proof As usual, write for Lebesgue outer measure on R, for Lebesgue measure.
(a) To begin with (down to the end of (c) below), let us suppose that f is non-decreasing and I is a
bounded open interval on which f is bounded; say |f (x)| M for x I. For any closed subinterval J = [a, b]
of I, write f (J) for the open interval ]f (a), f (b)[. For x I, write
D f (x) = lim suph0 h1 (f (x + h) f (x)), D f (x) = lim inf h0 h1 (f (x + h) f (x)),
allowing the value in both cases. Then f is differentiable at x iff D f (x) = D f (x) R. Because surely
D f (x) D f (x) 0, f will be differentiable at x iff D f (x) is finite and D f (x) D f (x).
I therefore have to show that the sets
{x : x I, D f (x) = }, {x : x I, D f (x) > D f (x)}
are negligible.
(b) Let us take A = {x : x I, D f (x) = } first. Fix an integer m 1 for the moment, and set
Am = {x : x I, D f (x) > m} A.
Let I be the family of non-singleton closed intervals [a, b] I such that f (b) f (a) m(b a); then
f (J) mJ for every J I. If x Am , then for any > 0 we have an h with 0 < |h| and
1
h (f (x + h) f (x)) > m, so that
[x, x + h] I if h > 0, [x + h, x] I if h < 0;
thus every member of Am belongs to arbitrarilySsmall intervals in I. By Vitalis theorem (221A), there is
a countable disjoint set I0 I such that (A \ I0 ) = 0. PNow, because f is non-decreasing,
P hf (J)iJI0

is disjoint, andSall the f (J) are included in [M, M ], so JI0 f (J) 2M and JI0 J 2M/m.
Because Am \ I0 is negligible,
58 The fundamental theorem of calculus 222A

A Am 2M/m.
As m is arbitrary, A = 0 and A is negligible.
(c) Now consider B = {x : x I, D f (x) > D f (x)}. For q, q 0 Q with 0 q < q 0 , set
Bqq0 = {x : x I, D f (x) < q, D f (x) > q 0 }.
Fix such q, q 0 for the moment, and write = Bqq0 . Take any > 0, and let G be an open set including
Bqq0 such that G + (134Fa). Let J be the set of non-singleton closed intervals [a, b] I G such
that f (b) f (a) q(b a); this time f (J) qJ for J J . Then every member of B Sqq0 is included in
arbitrarily small members of J , so there is a countable disjoint J0 J such that Bqq0 \ J0 is negligible.
Let L be the set of endpoints of members of J0 ; then L is a countable union of doubleton sets, so is countable,
therefore negligible. Set
S
C = Bqq0 J0 \ L;
then C = . Let I be the set of non-singleton closed intervals J = [a, b] such that (i) J is included in one
of the members of J0 (ii) f (b) f (a) q 0 (b a); now f (J) q 0 J for every J I. Once again, because
every member of C is an interior point of some member of J0 , every point S of C belongs to arbitrarily small
members of I; so there is a countable disjoint I0 I such that (C \ I0 ) = 0.
As in (b) above,
S P P S
q 0 q 0 ( I0 ) = II0 q 0 I II0 f (I) = ( II0 f (I)).
On the other hand,
[ X X [
( f (J)) = f (J) q J = q( J0 )
JJ0 JJ0 JJ0
[
q( J ) qG q( + ).
S S
But II0 f (I) JJ0 f (J), because every member of I0 is included in a member of J0 , so q 0 q( +)
and q/(q 0 q). As is arbitrary, = 0. S
Thus every Bqq0 is negligible. Consequently B = q,q0 Q,0q<q0 Bqq0 is negligible.
(d) This deals with the case of a bounded open interval on which f is bounded and non-decreasing. Still
for non-decreasing f , but
S for an arbitrary interval I, observe that K = {(q, q 0 ) : q, q 0 I Q, q < q 0 } is
0
countable and that I \ (q,q0 )K ]q, q [ has at most two points (the endpoints of I, if any), so is negligible. If
we write S for the set of points of I at which fSis not differentiable, then from (a)-(c) we see that S ]q, q 0 [
is negligible for every (q, q 0 ) K, so that S (q,q0 )K ]q, q 0 [ is negligible and S is negligible.

(e) Thus we are done if f is non-decreasing. For non-increasing f , apply the above to f , which is
differentiable at exactly the same points as f .

P
222B Remarks (a) I note that in the above argument I am using such formulae as JI0 f (J).
This is because Vitalis theorem leaves it open whether the families I0 will be finite
P or infinite. The sum
must be interpreted along the lines laid down in 112Bd P in Volume 1; generally, kK ak , where
P K is an
arbitrary set and every ak 0, is to be supLK is finite kL ak , with the convention that k ak = 0.
Now, in this context, if (X, , ) is a measure space, K is a countable set, and hEk ikK is a family in ,
S P
( kK Ek ) kK Ek ,
with equality if hEk ikK is disjoint. P
P If K = , this is trivial. Otherwise, let n 7 kn : N K be a
surjection, and set
S S
Kn = {ki : i n}, Gn = in Eki = kKn Ek
S
for each n N. Then hGn inN is a non-decreasing sequence with union E = kK Ek , so
E = limn Gn = supnN Gn ;
and
222C Differentiating an indefinite integral 59

P P
Gn kKn Ek kK Ek
P P
for every n, so E kK Ek . If the Ek are disjoint, then Gn is precisely kKn Ek for each n; but
as hKn inN is a non-decreasing sequence of sets with union K, every finite subset of K is included in some
Kn , and
P P
kK Ek = supnN kKn Ek = supnN Gn = E,
as required. Q
Q

(b) Some Preaders will


Pprefer to re-index sets regularly, so that all the sums they need to look at will be
n
of the form i=0 or i=0 . In effect, that is what I did in Volume 1, in the proof of 114Da/115Da, when
showing that Lebesgue outer measure is indeed an outer measure. The disadvantage of this procedure in
the context of 222A is that we must continually check that it doesnt matter whether we have a finite or
infinite sum at any particular moment. I believe that it is worth taking the trouble to learn the technique
sketched here, because it very frequently happens that we wish to consider unions of sets indexed by sets
other than N and {0, . . . , n}.

(c) Of course the argument above can be shortened if you know a tiny
P bit more about countable sets than
I have explicitly stated so far. But note that the value assigned to kK ak must not depend on which
enumeration hkn inN we pick on.

222C Lemma Suppose that a b in R, and that F : [a, b] R is a non-decreasing function. Then
Rb
a
F 0 exists and is at most F (b) F (a).
Remark I discussed integration over subsets at length in 131 and 214. For measurable subsets, which
are sufficient for our needs in this chapter, we have
R a simple
R description: if (X, , ) is a measure space,
E and f is a real-valued function, then E f = f if the latter integral exists, where dom f =
(E dom f ) (X \ E) and f(x) = fR(x) if xR E dom f , 0 if x X \ E (apply 131Fa to f). It follows at
once that if now F and F E, F f = E f F .
Rx R
I write a f to mean [a,x[ f , which (because [a, x[ is measurable) can be dealt with as described above.
Note that, as long as we are dealingR with Lebesgue
R R measure,
R so that [a, x] \ ]a, x[ = {a, x} is negligible, there
is no need to distinguish between [a,x] , ]a,x[ , [a,x[ , ]a,x] ; for other measures on R we may need to take
Rx Ry Ry
more care. I use half-open intervals to make it obvious that a f + x f = a f if a x y, because
f [a, y[ = f [a, x[ + f [x, y[.

proof (a) The result is trivial if a = b; let us suppose that a < b. By 222A, F 0 is defined almost everywhere
on [a, b].
(b) For each n N, define a simple function gn : [a, b[ R as follows. For 0 k < 2n , set ank =
a + 2n k(b a), bnk = a + 2n (k + 1)(b a), Ink = [ank , bnk [. For each x [a, b[, take that k < 2n such
that x Ink , and set
2n
gn (x) = (F (bnk ) F (ank ))
ba
for x Ink , so that gn gives the slope of the chord of the graph of F defined by the endpoints of Ink . Then
Rb P2n 1
g = k=0 F (bnk ) F (ank ) = F (b) F (a).
a n

(c) On the other hand, if we set


C = {x : x ]a, b[ , F 0 (x) exists},
then [a, b] \ C is negligible, by 222A, and F 0 (x) = limn gn (x) for every x C. P
P Let > 0. Then there
is a > 0 such that x + h [a, b] and |F (x + h) F (x)) hF 0 (x)| |h| whenever |h| . Let n N be
such that 2n (b a) . Let k < 2n be such that x Ink . Then
2n
x ank x < bnk x + , gn (x) = (F (bnk ) F (ank )).
ba
60 The fundamental theorem of calculus 222C

Now we have

2n
|gn (x) F 0 (x)| = | (F (bnk ) F (ank )) F 0 (x)|
ba
2n
= |F (bnk ) F (ank ) (bnk ank )F 0 (x)|
ba
2n
|F (bnk ) F (x) (bnk x)F 0 (x)|
ba

+ |F (x) F (ank ) (x ank )F 0 (x)|
2n
(|bnk x| + |x ank |) = .
ba

And this is true whenever 2n , that is, for all n large enough. As is arbitrary, F 0 (x) = limn gn (x).
Q
Q
(d) Thus gn F 0 almost everywhere on [a, b]. By Fatous Lemma,
Rb Rb Rb Rb
a
F0 = a
lim inf n gn lim inf n a
gn = limn a
gn = F (b) F (a),
as required.
Remark There is a generalization of this result in 224I.

222D Lemma R x Suppose


R x that a < b in R, and that f , g are real-valued functions, both integrable over
[a, b], such that a f = a g for every x [a, b]. Then f = g almost everywhere on [a, b].
proof The point is that
R Rb Rb R
E
f= a
f E = a
g E = E
g
for any measurable set E [a, b[.
P
P (i) If E = [c, d[ where a c d b, then
R Rd Rc Rd Rc R
E
f= a
f a
f= a
g a
g= E
g.

(ii) If E = [a, b[ G for some open set G R, then for each n N set
Kn = {k : k Z, |k| 4n , [2n k, 2n (k + 1)[ G},
S
Hn = kKn [2n k, 2n (k + 1)[ [a, b[;
then hHn inN is a non-decreasing sequence of measurable sets with union E, so f E = limn f Hn ,
and (by Lebesgues Dominated Convergence Theorem, because |f Hn | |f | almost everywhere for every
n, and |f | is integrable)
R R
E
f = limn Hn
f.
At the same time, each Hn is a finite disjoint union of half-open intervals in [a, b[, so
R P R P R R
H
f = kK n [2 n k,2n (k+1)[[a,b[ f = kK n [2 n k,2n (k+1)[[a,b[ g =
H
g,
n n

and
R R R R
E
g = limn Hn
g = limn Hn
f= E
f.

(iii) For general measurable E T [a, b[, we can choose for each n N an open set Gn E such that
Gn E + 2n (134Fa). Set G0n = mn Gm , En = [a, b[ G0n for each n,
T T T
F = [a, b[ nN Gn = nN [a, b[ G0n = nN En .
Then E F and
F inf nN Gn = E,
222F Differentiating an indefinite integral 61
R R R
so F \ E is negligible and f (F \ E) is zero almost everywhere; consequently F \E
f = 0 and F
f= E
f.
On the other hand,
f F = limn f En ,
so by Lebesgues Dominated Convergence Theorem again
R R R
E
f= F
f = limn En
f.
Similarly
R R
g = limn E g.
E n
R R R R
But by part (ii) we have En g = En f for every n, so E g = E f , as required. Q
Q
By 131Hb, f = g almost everywhere on [a, b[, and therefore almost everywhere on [a, b].

222E TheoremR Suppose that a b in R and that f is a real-valued function which is integrable over
x
[a, b]. Then F (x) = a f exists in R for every x [a, b], and the derivative F 0 (x) exists and is equal to f (x)
for almost every x [a, b].
proof (a) For most of this proof (down to the end of (c) below) I suppose that f is non-negative. In this
case,
Ry
F (y) = F (x) + x
f F (x)
whenever a x y b; thus F is non-decreasing and therefore differentiable almost everywhere in [a, b],
by 222A. Rx
By 222C we know also that a F 0 exists and is less than or equal to F (x) F (a) = F (x) for every
x [a, b].
(b) Now suppose, in addition, that f is bounded; say 0 f (t) M for every t dom f . Then M f is
integrable over [a, b]; let G be its indefinite
R x integral, so that G(x) = M (x a) F (x) for every x [a, b].
Applying (a) to M f and G, we have a G0 G(x) for every x [a, b]; but of course G0 = M F 0 , so
Rx Rx Rx Rx
M (x a) a F 0 M (x a) F (x), that is, a F 0 F (x) for every x [a, b]. Thus a F 0 = a f for
every x [a, b]. Now 222D tells us that F 0 = f almost everywhere on [a, b].
(c) Thus for bounded, non-negative f we are done. For unbounded f , let hfn inN be a non-decreasing
sequence of non-negative simple functions converging to f almost everywhere on [a, b], and let hFn inN be
the corresponding indefinite integrals. Then for any n and any x, y with a x y b, we have
Ry Ry
F (y) F (x) = x
f x
fn = Fn (y) Fn (x),
0
so that F (x) Fn0 (x)
for any x ]a, b[ where both are defined, and F 0 (x) fn (x) for almost every x [a, b].
This is true for every n, so F 0 f almost everywhere, and F 0 f 0 almost everywhere. On the other
hand, as noted in (a),
Rb Rb
a
F 0 F (b) F (a) = a
f,
Rb
so a
F 0 f 0. It follows that F 0 =a.e. f (that is, that F 0 = f almost everywhere)(122Rc).
(d) This completes the proof for non-negative f . For general f , we can express f as f1 f2 where f1 ,
f2 are non-negative integrable functions; now F = F1 F2 where F1 , F2 are the corresponding indefinite
integrals, so F 0 =a.e. F10 F20 =a.e. f1 f2 , and F 0 =a.e. f .

R x 222F Corollary Suppose that0


f is any real-valued function which is integrable over R, and set F (x) =

f for every x R. Then F (x) exists and is equal to f (x) for almost every x R.
proof For each n N, set
Rx
Fn (x) = n
f
for x [n, n]. Then Fn0 (x) = f (x) for almost every x [n, n]. But F (x) = F (n) + Fn (x) for every
x [n, n], so F 0 (x) exists and is equal to Fn0 (x) for every x ]n, n[ for which Fn0 (x) is defined; and
F 0 (x) = f (x) for almost every x [n, n]. As n is arbitrary, F 0 (x) = f (x) for almost every x R.
62 The fundamental theorem of calculus 222G

222G Corollary Suppose that R E R is a measurable set and that f is a real-valued function which
is integrable over E. Set F (x) = E],x[ f for x R. Then F 0 (x) = f (x) for almost every x E, and
F 0 (x) = 0 for almost every x R \ E.
proof RApply 222F to f, where f(x) = f (x) for x E dom f and f(x) = 0 for x R \ E, so that
x
F (x) =
f for every x R.
d
Rx
222H The result that dx a
f = f (x) for almost every x is satisfying, but is no substitute for the more
elementary result that this equality is valid at any point at which f is continuous.
PropositionR Suppose that a b in R and that f is a real-valued function which is integrable over [a, b].
x
Set F (x) = a f for x [a, b]. Then F 0 (x) exists and is equal to f (x) at any point x dom(f ) ]a, b[ at
which f is continuous.
proof Set c = f (x). Let > 0. Let > 0 be such that min(b x, x a) and |f (t) c| whenever
t dom f and |t x| . If x < y x + , then
F (y)F (x) 1 Ry 1 Ry
| c| = | f c| |f c| .
yx yx x yx x

Similarly, if x y < x,
F (y)F (x) 1 Rx 1 Rx
| f (x)| = | f c| |f c| .
yx xy y xy y

As is arbitrary,
F (y)F (x)
F 0 (x) = limyx = c,
yx
as required.

222I Complex-valued functions In the work above, I have taken f to be real-valued throughout.
The extension to complex-valued f is just a matter of applying the above results to the real and imaginary
parts of f . Specifically, we have the following.
Rx
(a) If a b in R and f is a complex-valued function which is integrable over [a, b], then F (x) = a f
is defined in C for every x [a, b], and its derivative F 0 (x) exists and is equal to f (x) for almost every
x [a, b]; moreover, F 0 (x) = f (x) whenever x dom(f ) ]a, b[ and f is continuous at x.
Rx
(b) If f is a complex-valued function which is integrable over R, and F (x) = f for each x R, then
F 0 exists and is equal to f almost everywhere in R.
(c) If RE R is a measurable set and f is a complex-valued function which is integrable over E, and
F (x) = E],x[ f for each x R, then F 0 (x) = f (x) for almost every x E and F 0 (x) = 0 for almost
every x R \ E.

222X Basic exercises > (a) Suppose that a < b in RR and that f is a real-valued function which is
x
integrable over [a, b]. Show that the indefinite integral x 7 a f is continuous.
Ry
> (b) Suppose that a < b in R and that h is a real-valued function such that x h exists and is non-negative
whenever a x y b. Show that h 0 almost everywhere on [a, b].

R x> (c) RSuppose


x
that a < b in R and that f , g are integrable complex-valued functions on [a, b] such that
a
f = a
g for every x [a, b]. Show that f = g almost everywhere on [a, b].
R1
> (d) Let F : [0, 1] [0, 1] be the Cantor function (134H). Show that 0 F 0 < F (1) F (0).

222Y Further exercises


P (a) Let hFn inN be a sequence of non-negative, Pnon-decreasing functions on
0 0
[0, 1] such that F (x) = n=0 Fn (x) is finite for every
P x [0, 1]. Show that F
n=0 n (x) =Pnk for almost
F (x)
every x [0, 1]. (Hint: take hnk ikN such that k=0 F (1) Gk (1) < , where Gk = j=0 Fj , and set
P P 0 0 0
H(x) = k=0 F (x) Gk (x). Observe that k=0 F (x) Gk (x) H (x) whenever all the derivatives are
defined, so that F 0 = limk G0k almost everywhere.)
223A Lebesgues density theorems 63

(b) Let F : [0, 1] R be a continuous non-decreasing function. (i) Show that if c R then C = {(x, y) :
x, y [0, 1], F (y) F (x) = c} is connected. (Hint: A set A R r is connected if there is no continuous
surjection h : A {0, 1}. Show that if h : C {0, 1} is continuous then it is of the form (x, y) 7 h1 (x)
for some continuous function h1 .) (ii) Now suppose that F (0) = 0, F (1) = 1 and that G : [0, 1] [0, 1] is a
second continuous non-decreasing function with G(0) = 0, G(1) = 1. Show that for any n 1 there are x,
y [0, 1] such that F (y) F (x) = G(y) G(x) = n1 .
(c) Let Rf , g be non-negative
R Rv integrable
R functions on R, and n 1. Show that there are u < v in [, ]
v
such that u f = n1 f , u g = n1 g.
(d) Let f : R R be measurable. Show that H = dom f 0 is a measurable set and that f 0 is a measurable
function.

222 Notes and R x comments I have relegated to an exercise (222Xa) the fundamental fact that an indefinite
integral x 7 a f is always continuous; this is not strictly speaking needed in this section, and a much
stronger result is given in 225E. There is also much more to be said about monotonic functions, to which I
will return in 224. What we need here is the fact that they are differentiable almost everywhere (222A),
which I prove by applying Vitalis theorem three times, once in part (b) of the proof and twice in part (c).
Following this, the arguments of 222C-222E form a fine series of exercises in the central ideas of Volume
1, using the concept of integration over a (measurable) subset, Fatous Lemma (part (d) of the proof of
222C), Lebesgues Dominated Convergence Theorem (parts (a-ii) and (a-iii) of the proof of 222D) and
the approximation R xof Lebesgue measurable sets by open sets (part (a-iii) of the proof of 222D). Of course
d
knowing that dx a
f = f (x) almost everywhere is not at all the same thing as knowing that this holds for
any particular x, and when we come to differentiate any particular indefinite integral we generally turn to
222H first; the point of 222E is that it applies to wildly discontinuous functions, for which more primitive
methods give no information at all.

223 Lebesgues density theorems


I now turn to a group of results which may be thought of as corollaries of Theorem 222E, but which also
have a vigorous life of their own, including the possibility of significant generalizations which will be treated
in Chapter 26. The idea is that any measurable function f on R r is almost everywhere continuous in a
variety of very weak senses; for almost every x, the value f (x) is determined by the behaviour of f near x,
in the sense that f (y) l f (x) for most y near x. I should perhaps say that while I recommend this work as
a preparation for Chapter 26, and I also rely on it in Chapter 28, I shall not refer to it again in the present
chapter, so that readers in a hurry to characterize indefinite integrals may proceed directly to 224.

223A Lebesgues Density Theorem: integral form Let I be an interval in R, and let f be a real-
valued function which is integrable over I. Then
Z x+h Z x Z x+h
1 1 1
f (x) = lim f= lim f= lim f
h0 h x h0 h xh h0 2h xh
for almost every x I.
R
proof Setting F (x) = I],x[
f , we know from 222G that
Z x+h
0 1 1
f (x) = F (x) = lim (F (x + h) F (x)) = lim f
h0 h h0 h
Zxx
1 1
= lim (F (x) F (x h)) = lim f
h0 h h0 h xh
Z x+h
1 1
= lim (F (x + h) F (x h)) = lim f
h0 2h h0 2h xh
for almost every x I.
64 The fundamental theorem of calculus 223B

223B Corollary Let E R be a measurable set. Then


1
limh0 (E [x h, x + h]) = 1 for almost every x E,
2h

1
limh0 (E [x h, x + h]) = 0 for almost every x R \ E.
2h

proof Take n N. Applying 223A to f = (E [n, n]), we see that


1 R x+h 1
limh0 xh
f = limh0 (E [x h, x + h])
2h 2h
whenever x ]n, n[ and either limit exists, so that
1
limh0 (E [x h, x + h]) = 1 for almost every x E [n, n],
2h

1
limh0 (E [x h, x + h]) = 0 for almost every x [n, n] \ E.
2h
As n is arbitrary, we have the result.
1
Remark For a measurable set E R, a point x such that limh0 (E [x h, x + h]) = 1 is sometimes
2h
called a density point of E.

223C Corollary Let f be a measurable real-valued function defined almost everywhere on R. Then for
almost every x R,
1
limh0 {y : y dom f, |y x| h, |f (y) f (x)| } = 1,
2h

1
limh0 {y : y dom f, |y x| h, |f (y) f (x)| } = 0
2h
for every > 0.
proof For q, q 0 Q, set
Dqq0 = {x : x dom f, q f (x) < q 0 },
so that Dqq0 is measurable,
1
Cqq0 = {x : x Dqq0 , limh0 (Dqq0 [x h, x + h]) = 1},
2h
so that Dqq0 \ Cqq0 is negligible, by 223B; now set
S
C = dom f \ q,q 0 Q (Dqq
0 \ Cqq0 ),
so that R \ C is negligible. If x C and > 0, then there are q, q 0 Q such that f (x) q f (x) <
q 0 f (x) + , so that x belongs to Dqq0 and therefore to Cqq0 , and now

1
lim inf {y : y dom f [x h, x + h], |f (y) f (x)| }
h0 2h
1
lim inf (Dqq0 [x h, x + h])
h0 2h
= 1,
so
1
limh0 {y : y dom f [x h, x + h], |f (y) f (x)| } = 1.
2h
It follows at once that
1
limh0 {y : y dom f [x h, x + h], |f (y) f (x)| > } = 0
2h
223Ea Lebesgues density theorems 65

for almost every x; but since is arbitrary, this is also true of 12 , so in fact
1
limh0 {y : y dom f [x h, x + h], |f (y) f (x)| } = 0
2h
for almost every x.

223D Theorem Let I be an interval in R, and let f be a real-valued function which is integrable over
I. Then
1 R x+h
limh0 xh
|f (y) f (x)|dy = 0
2h
for almost every x I.
proof (a) Suppose first that I is a bounded open interval ]a, b[. For each q Q, set gq (x) = |f (x) q| for
x I dom f ; then g is integrable over I, and
1 R x+h
limh0 xh
gq = gq (x)
2h
for almost every x I, by 223A. Setting
1 R x+h
Eq = {x : x I dom f, limh0 xh
gq = gq (x)},
2h
T
we have I \ Eq negligible, so I \ E is negligible, where E = qQ Eq . Now
1 R x+h
limh0 xh
|f (y) f (x)|dy = 0
2h

for every x E. P
P Take x E and > 0. Then there is a q Q such that |f (x) q| , so that
|f (y) f (x)| |f (y) q| + = gq (y) +
for every y I dom f , and
Z x+h Z x+h
1 1
lim sup |f (y) f (x)|dy lim sup gq (y) + dy
h0 2h xh h0 2h xh

= + gq (x) 2.
As is arbitrary,
1 R x+h
limh0 xh
|f (y) f (x)|dy = 0,
2h
as required. Q
Q
(b) If I is an unbounded open interval, apply (a) to the intervals In = I ]n, n[ to see that the limit
is zero almost everywhere on every In , and therefore on I. If I is an arbitrary interval, note that it differs
by at most two points from an open interval, and that since we are looking only for something to happen
almost everywhere we can ignore these points.
Remark The set
1 R x+h
{x : x dom f, limh0 xh
|f (y) f (x)|dy = 0}
2h
is sometimes called the Lebesgue set of f .

223E Complex-valued functions I have expressed the results above in terms of real-valued functions,
this being the most natural vehicle for the ideas. However there are applications of great importance in
which the functions involved are complex-valued, so I spell out the relevant statements here. In all cases
the proof is elementary, being nothing more than applying the corresponding result (223A, 223C or 223D)
to the real and imaginary parts of the function f .

(a) Let I be an interval in R, and let f be a complex-valued function which is integrable over I. Then
66 The fundamental theorem of calculus 223Ea

Z x+h Z x Z x+h
1 1 1
f (x) = lim f = lim f = lim f
h0 h x h0 h xh h0 2h xh

for almost every x I.

(b) Let f be a measurable complex-valued function defined almost everywhere on R. Then for almost
every x R,
1
limh0 {y : y dom f, |y x| h, |f (y) f (x)| } = 0
2h
for every > 0.

(c) Let I be an interval in R, and let f be a complex-valued function which is integrable over I. Then
1 R x+h
limh0 xh
|f (y) f (x)|dy = 0
2h
for almost every x I.

223X Basic exercises > (a) Let E [0, 1] be a measurable set for which there is an > 0 such that
(E [a, b]) (b a) whenever 0 a b 1. Show that E = 1.

1
(b) Let A R be any set. Show that limh0 (A [x h, x + h]) = 1 for almost every x A. (Hint:
2h
apply 223B to a measurable envelope E of A.)

1
(c) Let f be any real-valued function defined almost everywhere in R. Show that limh0 {y :y
2h
dom f, |y x| h, |f (y) f (x)| } = 1 for almost every x R. (Hint: use the argument of 223C, but
with 223Xb in place of 223B.)

> (d) Let I be an interval in R, and let f be a real-valued function which is integrable over I. Show that
R x+h
limh0 h1 x |f (y) f (x)|dy = 0 for almost every x I.

(e) Let E, F R be measurable sets, and suppose that F is bounded and of non-zero measure. Let
1 (E(x+hF ))
x R be such that limh0 (E [x h, x + h]) = 1. Show that limh0 = 1. (Hint: it helps
2h h F
to know that (hF ) = hF (134Ya, 263A). Show that if F [M, M ], then
1 F (E(x+hF ))
(E [x hM, x + hM ]) 1 1 .)
2hM 2M h F

(Compare 223Ya.)

(f ) Let f be a real-valued function which is integrable over R, and let E be the Lebesgue set of f . Show
1
R x+h
that limh0 2h xh
|f (t) c|dt = |f (x) c| for every x E and c R.

(g) Let f be an integrable real-valued function defined almost everywhere in R. Let x dom f be such
n R x+1/n
that limn x1/n
|f (y) f (x)| = 0. Show that x belongs to the Lebesgue set of f .
2

(h) Let f be an integrable real-valued function defined almost everywhere in R, and x any point of the
Lebesgue set of f . Show that for every > 0 there is a > 0 such that whenever I is a non-trivial interval
1 R
and x I [x , x + ], then |f (x) I
f | .
I

(i) Let E, F R be measurable sets, and x R a point which is a density point of both. Show that x is
a density point of E F .
223 Notes Lebesgues density theorems 67

T (j) Let E R be a non-negligible measurable set. Show that for any n N there is a > 0 such that
in E + xi is non-empty whenever x0 , . . . , xn R are such that |xi xj | for all i, j n. (Hint: find
n
a non-trivial interval I such that (E I) > n+1 I.)

223Y Further exercises (a) Let E, F R be measurable sets, and suppose that 0 < F < . Let
1
x R be such that limh0 (E [x h, x + h]) = 1. Show that
2h
(E(x+hF ))
limh0 = 1.
h F
(Hint: apply 223Xe to sets of the form F [M, M ].)

(b) Let T be the family of measurable sets G R such that every point of G is a density point of G. (i)
Show that
S T is a topology on R. (Hint: take G T. By 215B(iv) there is a countable G0 G such that
(G \ G0 ) = 0 for every G G. Show that
S 1 S
G {x : lim suph0 ( G0 [x h, x + h]) > 0},
2h
S S
so that ( G \ G0 ) = 0.) (ii) Show that a function f : R R is measurable iff it is T-continuous at
almost every x R. (T is the density topology on R. See 414P in Volume 4.)

(c)
R x Show that if f : [a, b] R is bounded and continuous for the density topology on R, then f (x) =
d
dx a
f for every x ]a, b[.
1
(d) Show that a function f : R R is continuous for the density topology at x R iff limh0 2h {y :
y [x h, x + h], |f (y) f (x)| } = 0 for every > 0.

(y,A)
(e) A set A R is porous at a point x R if lim supyx > 0, where (y, A) = inf aA ky ak.
kyxk
Show that if A is porous at every x A then A is negligible.

(f ) For a measurable set E R write E for the set of its density points. Show that (i) (EF ) = EF
for all measurable sets E, F (ii) for measurable sets E and F , E F iff (E \ F ) = 0 (iii) (E4E) = 0
for every measurable set E (iv) (E) = E for every measurable set E (v) for every compact set K E
there is a compact set L K E such that K L.

(g) Let f be an integrable real-valued function defined almost everywhereR in R,R and x anyR point of the
Lebesgue set of f . Show that for every > 0 there is a > 0 such that |f (x) g f g| g whenever
g : R [0, [ is such that g is non-decreasing on ], x], non-increasing on [x, [ and zero outside
g(x) Pn
[x , x + ]. (Hint: express g as a limit almost everywhere of functions of the form i=0 ]ai , bi [,
n+1
where x a0 . . . an x bn . . . b0 x + .)

223 Notes and comments The results of this section can be thought of as saying that a measurable
function is in some sense almost continuous; indeed, 223Yb is an attempt to make this notion precise. For
an integrable function we have stronger results, of which the furthest-reaching seems to be 223D/223Ec.
There are r-dimensional versions of all these theorems, using balls centred on x in place of intervals
[x h, x + h]; I give these in 261C-261E. A new idea is needed for the r-dimensional version of Lebesgues
density theorem (261C), but the rest of the generalization is straightforward. A less natural, and less
important, extension, also in 261, involves functions defined on non-measurable sets (compare 223Xa-
223Xc).
In 223D, and again in 223Xf, the essential idea is just that the intersection of countably many conegligible
sets is conegligible. Put like this, it should by now be almost second nature to you. But applications of this
kind tend to hinge on selecting the right family of conegligible sets to look at the intersection of. And the
guiding principle is, that you need not be economical. If you have any countable family of conegligible sets in
hand, you are entitled to work within its intersection. So, for instance, once we have established that every
68 The fundamental theorem of calculus 223 Notes

measurable function has a conegligible Lebesgue set, then henceforth we can work within the intersections of
the Lebesgue sets of all the functions we have names for provided that in this context we restrict ourselves
to names based on some countable language. Thus in 223Xf I suggest looking at functions of the form
x 7 f (x) q where q Q. The countable language used here has a name for the given function f and for
every rational number, but not for all real numbers. Of course we could certainly have larger countable sets
in place of Q if they helped. For once, it is worth trying to develop a restrictable imagination: you want to
take the intersection of all the conegligible sets you can think of, but in so doing you must shift temporarily
into a frame of mind which encompasses only a countable universe. (The objects of that countable universe
can of course be uncountable sets; it is the universe which should be countable, when seen from outside,
not its members.) The countable universe you use can in principle contain all the individual objects which
appear in the statement of the problem; thus in 223Xf, for instance, the function f itself surely belongs to
the relevant universe, like the set R (but not most of its members) and the operation of subtraction.

224 Functions of bounded variation


I turn now to the second of the two problems to which this chapter is devoted: the identification of those
real functions which are indefinite integrals. I take the opportunity to offer a brief introduction to the theory
of functions of bounded variation, which are interesting in themselves and will be important in Chapter 28.
I give the basic characterization of these functions as differences of monotonic functions (224D), with a
representative sample of their elementary properties.

224A Definition Let f be a real-valued function and D a subset of R. I define VarD (f ), the (total)
variation of f on D, as follows. If D dom f = , VarD (f ) = 0. Otherwise, VarD (f ) is
Pn
sup{ i=1 |f (ai ) f (ai1 )| : a0 , a1 , . . . , an D dom f, a0 a1 . . . an },
allowing VarD (f ) = . If VarD (f ) is finite, we say that f is of bounded variation on D. If the context
seems clear, I may write Var f for Vardom f (f ), and say that f is simply of bounded variation if this is
finite.

224B Remarks (a) In the present chapter, we shall virtually exclusively be concerned with the case in
which D is a bounded closed interval included in dom f . The general formulation will be useful for some
technical questions arising in Chapter 28; but if it makes you more comfortable, you will lose nothing by
supposing for the moment that D is an interval.

(b) Clearly
VarD (f ) = VarDdom f (f ) = Var(f D)
for all D, f .

224C Proposition (a) If f , g are two real-valued functions and D R, then


VarD (f + g) VarD (f ) + VarD (g).
(b) If f is a real-valued function, D R and c R then VarD (cf ) = |c| VarD (f ).
(c) If f is a real-valued function, D R and x R then
VarD (f ) VarD],x] (f ) + VarD[x,[ (f ),
with equality if x D dom f .
(d) If f is a real-valued function and D D0 R then VarD (f ) VarD0 (f ).
(e) If f is a real-valued function and D R, then |f (x) f (y)| VarD (f ) for all x, y D dom f ; so if
f is of bounded variation on D then f is bounded on D dom f and (if D dom f 6= )
supyDdom f |f (y)| |f (x)| + VarD (f )
for every x D dom f .
224C Functions of bounded variation 69

(f) If f is a monotonic real-valued function and D R meets dom f , then


VarD (f ) = supxDdom f f (x) inf xDdom f f (x).

proof (a) If D dom(f + g) = this is trivial, because VarD (f ) and VarD (g) are surely non-negative.
Otherwise, if a0 . . . an in D dom(f + g), then
n
X n
X n
X
|(f + g)(ai ) (f + g)(ai1 )| |f (ai ) f (ai1 )| + |g(ai ) g(ai1 )|
i=1 i=1 i=1
Var(f ) + Var(g);
D D

as a0 , . . . , an are arbitrary, VarD (f + g) VarD (f ) + VarD (g).


(b)
Pn Pn
i=1 |(cf )(ai ) (cf )(ai1 )| = |c| i=1 |f (ai ) f (ai1 )|
whenever a0 . . . an in D dom f .
(c)(i) If either D ], x] dom f or D [x, [ dom f is empty, this is trivial. If a0 . . . am in
D ], x] dom f , b0 . . . bn in D [x, [ dom f , then
m
X n
X m+n+1
X
|f (ai ) f (ai1 )| + |f (bi ) f (bi1 )| |f (ai ) f (ai1 )|
i=1 j=1 i=1

Var(f ),
[a,b]

if we write ai = bim1 for m + 1 i m + n + 1. So


VarD],x] (f ) + VarD[x,[ (f ) VarD (f ).

(ii) Now suppose that x D dom f . If a0 . . . an in D dom f , and a0 x an , let k be such


that x [ak1 , ak ]; then
n
X k1
X
|f (ai ) f (ai1 )| |f (ai ) f (ai1 )| + |f (x) f (ak1 )|
i=1 i=1
n
X
+ |f (ak ) f (x)| + |f (ai ) f (ai1 )|
i=k+1

Var (f ) + Var (f )
D],x] D[x,[
P0 Pn Pn
(counting empty sums i=1 , as 0). If x a0 then i=1 |f (ai ) f (ai1 )| VarD[x,[ (f ); if
Pn i=n+1
x an then i=1 |f (ai ) f (ai1 )| VarD],x] (f ). Thus
Pn
i=1 |f (ai ) f (ai1 )| VarD],x] (f ) + VarD[x,[ (f )

in all cases; as a0 , . . . , an are arbitrary,


VarD (f ) VarD],x] (f ) + VarD[x,[ (f ).
So the two sides are equal.
(d) is trivial.
(e) If x, y D dom f and x y then
|f (x) f (y)| = |f (y) f (x)| VarD (f )
by the definition of VarD ; and the same is true if y x. So of course |f (y)| |f (x)| + VarD (f ).
70 The fundamental theorem of calculus 224C

(f ) If f is non-decreasing, then
n
X
Var(f ) = sup{ |f (ai ) f (ai1 )| : a0 , a1 , . . . , an D dom f, a0 a1 . . . an }
D
i=1
Xn
= sup{ f (ai ) f (ai1 ) : a0 , a1 , . . . , an D dom f, a0 a1 . . . an }
i=1
= sup{f (b) f (a) : a, b D dom f, a b}
= sup f (b) inf f (a).
bDdom f aDdom f

If f is non-increasing then

Xn
Var(f ) = sup{ |f (ai ) f (ai1 )| : a0 , a1 , . . . , an D dom f, a0 a1 . . . an }
D
i=1
Xn
= sup{ f (ai1 ) f (ai ) : a0 , a1 , . . . , an D dom f, a0 a1 . . . an }
i=1
= sup{f (a) f (b) : a, b D dom f, a b}
= sup f (a) inf f (b).
aDdom f bDdom f

224D Theorem For any real-valued function f and any set D R, the following are equiveridical:
(i) there are two bounded non-decreasing functions f1 , f2 : R R such that f = f1 f2 on D dom f ;
(ii) f is of bounded variation on D;
(iii) there are bounded non-decreasing functions f1 , f2 : R R such that f = f1 f2 on D dom f
and VarD (f ) = Var f1 + Var f2 .
proof (i)(ii) If f : R R is bounded and non-decreasing, then Var f = supxR f (x) inf xR f (x) is
finite. So if f agrees on D dom f with f1 f2 where f1 and f2 are bounded and non-decreasing, then

Var(f ) = Var (f ) Var (f1 ) + Var (f2 )


D Ddom f Ddom f Ddom f

Var f1 + Var f2 < ,


using (a), (b) and (d) of 224C.
(ii)(iii) Suppose that f is of bounded variation on D. Set D0 = D dom f . If D0 = we can take
both fj to be the zero function, so henceforth suppose that D0 6= . Write
g(x) = VarD],x] (f )
for x D0 . Then g1 = g + f and g2 = g f are both non-decreasing. P
P If a, b D0 and a b, then
g(b) g(a) + VarD[a,b] (f ) g(a) + |f (b) f (a)|.
So
g1 (b) g1 (a) = g(b) g(a) f (b) + f (a), g2 (b) g2 (a) = g(b) g(a) + f (b) f (a)
are both non-negative. Q
Q
Now there are non-decreasing functions h1 , h2 : R R, extending g1 , g2 respectively, such that Var hj =
Var gj for both j. P
P f is bounded on D, by 224Ce, and g is bounded just because VarD (f ) < , so that
gj is bounded. Set cj = inf xD0 gj (x) and
hj (x) = sup({cj } {gj (y) : y D0 , y x})
Q Observe that for x D0 ,
for every x R; this works. Q
h1 (x) + h2 (x) = g1 (x) + g2 (x) = g(x) + f (x) + g(x) f (x) = 2g(x),
224F Functions of bounded variation 71

h1 (x) h2 (x) = 2f (x).


Now, because g1 and g2 are non-decreasing,
supxD0 g1 (x) + supxD0 g2 (x) = supxD0 g1 (x) + g2 (x) = 2 supxD0 g(x),

inf xD0 g1 (x) + inf xD0 g2 (x) = inf xD0 g1 (x) + g2 (x) = 2 inf xD0 g(x) 0.
But this means that
Var h1 + Var h2 = Var g1 + Var g2 = 2 Var g 2 VarD (f ),
1
using 224Cf three times. So if we set fj (x) = 2 hj (x) for j {1, 2}, x R, we shall have non-decreasing
functions such that
1 1
f1 (x) f2 (x) = f (x) for x D0 , Var f1 + Var f2 = Var h1 + Var h2 VarD (f ).
2 2
Since we surely also have
VarD (f ) VarD (f1 ) + VarD (f2 ) Var f1 + Var f2 ,
we see that VarD (f ) = Var f1 + Var f2 , and (iii) is true.
(iii)(i) is trivial.

224E Corollary Let f be a real-valued function and D any subset of R. If f is of bounded variation
on D, then
limxa VarD]a,x] (f ) = limxa VarD[x,a[ (f ) = 0
for every a R, and
lima VarD],a] (f ) = lima VarD[a,[ (f ) = 0.

proof (a) Consider first the case in which D = dom f = R and f is a bounded non-decreasing function.
Then
VarD]a,x] (f ) = supy]a,x] f (x) f (y) = f (x) inf y>a f (y) = f (x) limya f (y),
so of course
limxa VarD]a,x] (f ) = limxa f (x) limya f (y) = 0.
In the same way
limxa VarD[x,a[ (f ) = limya f (y) limxa f (x) = 0,

lima VarD],a] (f ) = lima f (a) limy f (y) = 0,

lima VarD[a,[ (f ) = limy f (y) lima f (a) = 0.

(b) For the general case, define f1 , f2 from f and D as in 224D. Then for every interval I we have
VarDI (f ) VarI (f1 ) + VarI (f2 ),
so the results for f follow from those for f1 and f2 as established in part (a) of the proof.

224F Corollary Let f be a real-valued function of bounded variation on [a, b], where a < b. If dom f
meets every interval ]a, a + ] with > 0, then
limtdom f,ta f (t)
is defined in R. If dom f meets [b , b[ for every > 0, then
limtdom f,tb f (t)
is defined in R.
proof Let f1 , f2 : R R be non-decreasing functions such that f = f1 f2 on [a, b] dom f . Then
72 The fundamental theorem of calculus 224F

limtdom f,ta f (t) = limta f1 (t) limta f2 (t) = inf t>a f1 (t) inf t>a f2 (t),

limtdom f,tb f (t) = limtb f1 (t) limtb f2 (t) = supt<b f1 (t) supt<b f2 (t).

224G Corollary Let f , g be real functions and D a subset of R. If f and g are of bounded variation
on D, so is f g.
proof (a) The point is that there are non-negative bounded non-decreasing functions f1 , f2 : R R such
that f = f1 f2 on D dom f . P P We know that there are bounded non-decreasing h1 , h2 such that
f = h1 h2 on D dom f . Set i = inf xR hi (x) for i = 1, 2,
1 = max(1 2 , 0), 2 = max(2 1 , 0),

f1 = h1 1 + 1 , f2 = h1 2 + 2 ;
this works. Q
Q
(b) Now taking similar functions g1 , g2 such that g = g1 g2 on D dom g, we have
f g = f1 g1 f2 g1 f1 g2 + f2 g2
everywhere on D dom(f g) = D dom f dom g; but all the fi gj are bounded non-decreasing functions,
so of bounded variation, and f g must be of bounded variation on D.

224H Proposition Let f : D R be a function of bounded variation, where D R. Then f is


continuous at all except countably many points of D.
proof For n 1 set

An = {x : x D, for every > 0 there is a y D [x , x + ]


1
such that |f (y) f (x)| }.
n

Then #(An ) n Var f . P P?? Otherwise, we can find distinct x0 , . . . , xk An with k + 1 > n Var f .
Order these so that x0 < x1 < . . . < xk . Set = 12 min1ik xi xi1 > 0. For each i, there is a
yi D [xi , xi + ] such that |f (yi ) f (xi )| n1 . Take x0i , yi0 to be xi , yi in order, so that x0i < yi0 . Now
x00 y00 x01 y10 . . . x0k yk0 ,
and
Pk Pk 1
Var f i=0 |f (yi0 ) f (x0i )| = i=0 |f (yi ) f (xi )| (k + 1) > Var f ,
n
which is impossible. XXS
QQ
It follows that A = nN An is countable, being a countable union of finite sets. But A is exactly the set
of points of D at which f is not continuous.

224I Theorem Let I R be an interval, and f : I R a function of bounded variation. Then f is


differentiable almost everywhere in I, and f 0 is integrable over I, with
R
I
|f 0 | VarI (f ).

proof (a) Let f1 and f2 be non-decreasing functions such that f = f1 f2 everywhere on I (224D). Then
f1 and f2 are differentiable almost everywhere (222A). At any point of I except possibly its endpoints, if
any, f will be differentiable if f1 and f2 are, so f 0 (x) is defined for almost every x I.
(b) Set F (x) = VarI],x] f for x R. If x, y I and x y, then
F (y) F (x) = Var[x,y] f |f (y) f (x)|,
by 224Cc; so F 0 (x) 0
R 0 |f (x)|
R whenever x is an interior point of I and both derivatives exist, which is almost
everywhere. So I |f | I F 0 . But if a, b I and a b,
224J Functions of bounded variation 73

Rb
a
F 0 F (b) F (a) F (b) Var f .
S
Now I is expressible as nN [an , bn ] where an+1 an bn bn+1 for every n. So
Z Z Z
|f 0 | F0 = F 0 I
I I
Z Z
0
= sup F [an , bn ] = sup F 0 [an , bn ]
nN nN
(by B.Levis theorem)
Z bn
= sup F 0 Var(f ).
nN an I

224J The next result is not needed in this chapter, but is one of the most useful properties of functions
of bounded variation, and will be used repeatedly in Chapter 28.
Proposition Let f , g be real-valued functions defined on subsets of R, and suppose that g is integrable
over an interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in
]a, b[. Then f g is integrable over [a, b], and
Z Z
b
c
f g lim |f (x)| + Var(f ) sup g .
xdom f,xb ]a,b[ c[a,b]
a a

proof (a) By 224F, l = limxdom f,xb f (x) is defined. Write M = |l| + Var]a,b[ (f ). Note that if y is any
point of dom f ]a, b[,
|f (y)| |f (x)| + |f (x) f (y)| |f (x)| + Var]a,b[ (f ) M
as x b in dom f , so |f (y)| M . Moreover, f is measurable on ]a, b[, because there are bounded monotonic
functions f1 , f2 : R R such that f = f1 f2 everywhere on ]a, b[ dom f . So f g is measurable and
dominated by M |g|, and is integrable over ]a, b[ or [a, b].
(b) For n N, k 2n set ank = a + 2n k(b a), and for 1 k 2n choose xnk dom f ]an,k1 , ank ].
Define fn : ]a, b] R by setting fn (x) = f (xnk ) if 1 k 2n , x ]an,k1 , ank ]. Then f (x) = limn fn (x)
whenever x ]a, b[ dom f and f is continuous at x, which must be almost everywhere (224H). Note next
that all the fn are measurable, and that they are uniformly bounded, in modulus, by M . So {fn g : n N}
is dominated by the integrable function M |g|, and Lebesgues Dominated Convergence Theorem tells us that
Rb Rb
a
f g = limn a
fn g.
Rc Rc
(c) Fix n N for the moment. Set K = supc[a,b] | a
g|. (Note that K is finite because c 7 a
g is
continuous.) Then

Z 2 n Z
b X ank
fn g = fn g
a k=1 an,k1

2n
X Z ank Z an,k1

= f (xnk )( g g)
k=1 a a
n Z Z
2X
1 ank b
= (f (xnk ) f (xn,k+1 )) g + f (xn,2n ) g
k=1 a a
Z n Z
b 2X1
ank
f (xn,2n ) g + f (xn,k+1 ) f (xnk ) g
a k=1 a

(|f (xn,2n )| + Var(f ))K M K


]a,b[
74 The fundamental theorem of calculus 224J

as n .
(d) Now
Rb Rb
| a
f g| = limn | a
fn g| M K,
as required.

224K Complex-valued functions So far I have taken all functions to be real-valued. This is adequate
for the needs of the present chapter, but in Chapter 28 we shall need to look at complex-valued functions of
bounded variation, and I should perhaps spell out the (elementary) adaptations involved in the extension
to the complex case.

(a) Let D be a subset of R and f a complex-valued function. The variation of f on D, VarD (f ), is zero
if D dom f = , and otherwise is
Pn
sup{ j=1 |f (aj ) f (aj1 )| : a0 a1 . . . an in D dom f },
allowing . If VarD (f ) is finite, we say that f is of bounded variation on D.

(b) Just as in the real case, a complex-valued function of bounded variation must be bounded, and
VarD (f + g) VarD (f ) + VarD (g),

VarD (cf ) = |c| VarD (f ),

VarD (f ) VarD],x] (f ) + VarD[x,[ (f )


for every x R, with equality if x D dom f ,
VarD (f ) VarD0 (f ) whenever D D0 ;
the arguments of 224C go through unchanged.

(c) A complex-valued function is of bounded variation iff its real and imaginary parts are both of bounded
variation (because
max(VarD (Re f ), VarD (Im f ) VarD (f ) VarD (Re f ) + VarD (Im f ).)
So a complex-valued function f is of bounded variation on D iff there are bounded non-decreasing functions
f1 , . . . , f4 : R R such that f = f1 f2 + if3 if4 on D (224D).

(d) Let f be a complex-valued function and D any subset of R. If f is of bounded variation on D, then
limxa VarD]a,x] (f ) = limxa VarD[x,a[ (f ) = 0
for every a R, and
lima VarD],a] (f ) = lima VarD[a,[ (f ) = 0.
(Apply 224E to the real and imaginary parts of f .)

(e) Let f be a complex-valued function of bounded variation on [a, b], where a < b. If dom f meets every
interval ]a, a + ] with > 0, then
limtdom f,ta f (t)
is defined in C. If dom f meets [b , b[ for every > 0, then
limtdom f,tb f (t)
is defined in C. (Apply 224F to the real and imaginary parts of f .)

(f ) Let f , g be complex functions and D a subset of R. If f and g are of bounded variation on D, so is


f g. (For f g is expressible as a linear combination of the four products Re f Re g, . . . , Im f Im g,
to each of which we can apply 224G.)
224Xj Functions of bounded variation 75

(g) Let I R be an interval, and f : I C a function of bounded variation. Then f is differentiable


almost everywhere on I, and
R
I
|f 0 | VarI (f ).
(As 224I.)

(h) Let f , g be complex-valued functions defined on subsets of R, and suppose that g is integrable over
an interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in ]a, b[.
Then f g is integrable over [a, b], and
Z b Z c

f g lim |f (x)| + Var(f ) sup g .
a xdom f,xb ]a,b[ c[a,b] a

(The argument of 224J applies virtually unchanged.)

1
224X Basic exercises >(a) Set f (x) = x2 sin for x 6= 0, f (0) = 0. Show that f : R R is
x2
differentiable everywhere and uniformly continuous, but is not of bounded variation on any non-trivial
interval containing 0.

(b) Give an example of a non-negative function g : [0, 1] [0, 1], of bounded variation, such that g is
not of bounded variation.

(c) Show that if f is any real-valued function defined on a subset of R, there is a function f : R R,
extending f , such that Var f = Var f . Under what circumstances is f unique?

(d) Let f : D R be a function of bounded variation, where D R is a non-empty set. Show that if
inf xD |f (x)| > 0 then 1/f is of bounded variation.

(e) Let f : [a,P


b] R be a continuous function, where a b in R. Show that if c < Var f then there is a
n
> 0 such that i=1 |f (ai )f (ai1 )| c whenever a = a0 a1 . . . an = b and max1in ai ai1 .

(f ) Let hfn inN be a sequence of real functions, and set f (x) = limn fn (x) whenever the limit is
defined. Show that Var f lim inf n Var fn .
Rx
(g) Let f be a real-valued function which is integrable over an interval [a, b] R. Set F (x) = a f for
Rb R
x [a, b]. Show that Var[a,b] (F ) = a |f |. (Hint: start by checking that Var F |f |; for the reverse
inequality, consider the case f 0 first.)

(h) Show that if f is a real-valued function defined on a set D R, then


Pn
VarD (f ) = sup{| i=1 (1)i (f (ai ) f (ai1 ))| : a0 a1 . . . an in D}.

(i) Let f be a real-valued function which is integrable over a bounded interval [a, b] R. Show that
Rb Pn R ai
a
|f | =sup{| i=1 (1)i a f | : a = a0 a1 a2 . . . an = b}.
i1

(Hint: put 224Xg and 224Xh together.)

(j) Let f and g be real-valued functions defined on subsets of R, and suppose that g is integrable over an
interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere on ]a, b[.
Show that
Rb Rb
| a
f g| (limxdom f,xa |f (x)| + Var]a,b[ (f )) supc[a,b] | c
g|.
76 The fundamental theorem of calculus 224Y

224Y Further exercises (a) Show that if f is any complex-valued function defined on a subset of R,
there is a function f : R C, extending f , such that Var f = Var f . Under what circumstances is f unique?

(b) Let D be any non-empty subset of R, and let V be the space of functions f : D R of bounded
variation. For f V set
Pn
kf k = sup{|f (t0 )| + i=1 |f (ti ) f (ti1 )| : t0 . . . tn D}.
Show that (i) k k is a norm on V (ii) V is complete under k k (iii) kf gk kf kkgk for all f , g V, so that
V is a Banach algebra.

(c) Let f : R R be a function of bounded variation. Show that there is a Rsequence hfn inN of
differentiable functions such that limn fn (x) = f (x) for every x R, limn |fn f | = 0, and
Var(fn ) Var(f ) for every n N. (Hint: start with non-decreasing f .)

(d) For any partially ordered set X and any function f : X R, say that VarX (f ) = 0 if X = and
otherwise
Pn
VarX (f ) = sup{ i=1 |f (ai ) f (ai1 )| : a0 , a1 , . . . , an X, a0 a1 . . . an }.
State and prove results in this framework generalizing 224D and 224Yb. (Hints: f will be non-decreasing
if f (x) f (y) whenever x y; interpret ], x] as {y : y x}.)

(e) Let (X, ) be a metric space and f : [a, b] X a function, where a b in R. Set Var[a,b] (f ) =
Pn
sup{ i=1 (f (ai ), f (ai1 )) : a a0 . . . an b}. (i) Show that Var[a,b] (f ) = Var[a,c] (f ) + Var[c,b] (f ) for
every c [a, b]. (ii) Show that if Var[a,b] (f ) is finite then f is continuous at all but countably many points
of [a, b]. (iii) Show that if X is complete and Var[a,b] (f ) < then limtx f (t) is defined for every x ]a, b].

(f ) Let U be a normed space and a b in R. For functions f : [a, b] U define Var[a,b] (f ) as in 224Ye,
using the standard metric (x, y) = kx yk for x, y U . (i) Show that Var[a,b] (f + g) Var[a,b] (f ) +
Var[a,b] (g), Var[a,b] (cf ) = |c| Var[a,b] (f ) for all f , g : [a, b] U and all c R. (ii) Show that if V is another
normed space and T : U V is a bounded linear operator then Var[a,b] (T f ) kT k Var[a,b] (f ) for every
f : [a, b] U .
1
(g) Let f : [0, 1] R be a continuous function. For y R set h(y) = #(f R [{y}]) if this is finite,
otherwise. Show that (if we allow as a value of the integral) Var[0,1] (f ) = h. (Hint: for n N, i < 2n
set cni = sup{f (x) f (y) : x, y [2n i, 2n (i + 1)]}, hni (y) = 1 if y f [ [2n i, 2n (i + 1)[ ], 0 otherwise.
R P2n 1 P2n 1
Show that cni = hni , limn i=0 cni = Var f , limn i=0 hni = h.) (See also 226Yb.)

(h) Let be any Lebesgue-Stieltjes measure on R, I R an interval (which may be either open or closed,
bounded or unbounded), and D I a non-empty set. Let V be the space of functions of bounded
R variation
from D to R, and k k the norm of 224Yb on V. Let g : D R be a function such that [a,b]D g d exists
R R
whenever a b in I, and K = supa,bI,ab | [a,b]D g d| < . Show that | D f g d| Kkf k for every
f V.

(i) Explain how to apply 224Yh with D = N to obtain Abels theorem that the product of a monotonic
sequence converging to 0 with a series which has bounded partial sums is summable.

224 Notes and comments I have taken the ideas above rather farther than we need immediately; for the
present chapter, it is enough to consider the case in which D = dom f = [a, b] for some interval [a, b] R.
The extension to functions with irregular domains will be useful in Chapter 28, and the extension to irregular
sets D, while not important to us here, is of some interest for instance, taking D = N, we obtain the notion
of sequence of bounded variation, which is surely relevant to problems of convergence and summability.
The central result of the section is of course the fact that a function of bounded variation can be expressed
as the difference of monotonic functions (224D); indeed, one of the objects of the concept is to characterize
the linear span of the monotonic functions. Nearly everything else here can be derived as an easy consequence
of this, as in 224E-224G. In 224I and 224Xg we go a little deeper, and indeed some measure theory appears;
225C Absolutely continuous functions 77

this is where the ideas here begin to connect with the real business of this chapter, to be continued in the
next section. Another result which is easy enough in itself, but contains the germs of important ideas, is
224Yg.
In 224Yb I mention a natural development in functional analysis, and in 224Yd and 224Ye-224Yf I suggest
further wide-ranging generalizations.

225 Absolutely continuous functions


We are now ready for a full characterization of the functions that can appear as indefinite integrals
(225E, 225Xh). The essential idea is that of absolute continuity (225B). In the second half of the section
(225G-225N) I describe some of the relationships between this concept and those we have already seen.

225A Absolute continuity of the indefinite integral I begin with an easy fundamental result from
general measure theory.
Theorem Let (X, , ) be any measure space and f an integrable real-valued function defined on a coneg-
ligible subset of RX. Then for any > 0 there are a measurable set E of finite measure and a real number
> 0 such that F |f | whenever F and (F E) .
proof There isR a non-decreasing
R sequence hgn inN of non-negative
R R simple functions such that |f | =a.e.
limn gn and |f | = limn gn . Take n N such that gn |f | 12 . Let M > 0, E be such
that E < and gn M E; set = /2M . If F and (F E) , then
R R 1
F
gn = gn F M (F E) ;
2
consequently
R R R 1 R
F
|f | = F
gn + F
|f | gn + |f | gn .
2

225B Absolutely continuous functions on R: Definition If [a, b] is a non-empty closed interval in


R and f : [a, b] P
R is a function, we say that f is absolutely continuous if for every > 0 there is
n
aP > 0 such that i=1 |f (bi ) f (ai )| whenever a a1 b1 a2 b2 . . . an bn b and
n
i=1 bi ai .

Remark The phrase absolutely continuous is used in various senses in measure theory, closely related (if
you look at them in the right way) but not identical; you will need to keep the context of each definition in
clear focus.

225C Proposition Let [a, b] be a non-empty closed interval in R.


(a) If f : [a, b] R is absolutely continuous, it is uniformly continuous.
(b) If f : [a, b] R is absolutely continuous it is of bounded variation on [a, b], so is differentiable almost
everywhere on [a, b], and its derivative is integrable over [a, b].
(c) If f , g : [a, b] R are absolutely continuous, so are f + g and cf , for every c R.
(d) If f , g : [a, b] R are absolutely continuous so is f g.
(e) If g : [a, b] [c, d] and f : [c, d] R are absolutely continuous, and g is non-decreasing, then the
composition f g : [a, b] R is absolutely continuous.
Pn
proof (a) Let > 0. Then there Pn is a > 0 such that i=1 |f (bi ) f (ai )| whenever a a1 b1 a2
b2 . . . an bn b and i=1 bi ai ; but of course now |f (y) f (x)| whenever x, y [a, b] and
|x y| . As is arbitrary, f is uniformly continuous.
Pn
(b)
PnLet > 0 be such that i=1 |f (bi ) f (ai )| 1 whenever a a1 b1 Pan2 b2 . . . an bn b
and i=1 bi ai . If a c = c0 c1 . . . cn d min(b, c + ), then i=1 |f (ci ) f (ci1 )| 1, so
78 The fundamental theorem of calculus 225C

Var[c,d] (f ) 1; accordingly (inducing on k, using 224Cc for the inductive step) Var[a,min(a+k,b)] (f ) k for
every k, and
Var[a,b] (f ) d(b a)/e < .
It follows that f 0 is integrable, by 224I.
(c)(i) Let > 0. Then there are 1 , 2 > 0 such that
Pn 1
i=1 |f (bi ) f (ai )| 2
Pn
whenever a a1 b1 a2 b2 . . . an bn b and i=1 bi ai 1 ,
Pn 1
i=1 |g(bi ) g(ai )| 2
Pn
whenever a a1 b1 a2 b2 . . . an bn b and P i=1 bi ai 2 . Set = min(1 , 2 ) > 0, and
n
suppose that a a1 b1 a2 b2 . . . an bn b and i=1 bi ai . Then
Pn Pn Pn
i=1 |(f + g)(bi ) (f + g)(ai )| i=1 |f (bi ) f (ai )| + i=1 |g(bi ) g(ai )| .
As is arbitrary, f + g is absolutely continuous.
(ii) Let > 0. Then there is a > 0 such that
Pn
i=1 |f (bi ) f (ai )| 1+|c|
Pn
whenever a a1 b1 a2 b2 . . . an bn b and i=1 bi ai . Now
Pn
i=1 |(cf )(bi ) (cf )(ai )|
Pn
whenever a a1 b1 a2 b2 . . . an bn b and i=1 bi ai . As is arbitrary, cf is
absolutely continuous.
(d) By either (a) or (b), f and g are bounded; set M = supx[a,b] |f (x)|, M 0 = supx[a,b] |g(x)|. Let
> 0. Then there are 1 , 2 > 0 such that
Pn
Pn i=1 |f (bi ) f (ai )| whenever a a1 b1 a2 b2 . . . an bn b and
i=1 bi ai 1 ,
Pn
Pn i=1 |g(bi ) g(ai )| whenever a a1 b1 a2 b2 . . . an bn b and
i=1 bi ai 2 . Pn
Set = min(1 , 2 ) > 0 and suppose that a a1 b1 . . . bn b and i=1 bi ai . Then
n
X n
X
|f (bi )g(bi ) f (ai )g(ai )| = |(f (bi ) f (ai ))g(bi ) + f (ai )(g(bi ) g(ai ))|
i=1 i=1
Xn
|f (bi ) f (ai )||g(bi )| + |f (ai )||g(bi ) g(ai )|
i=1
Xn
|f (bi ) f (ai )|M 0 + M |g(bi ) g(ai )|
i=1
M 0 + M = (M + M 0 ).
As is arbitrary, f g is absolutely continuous.
Pn
(e) Let > 0. Then Pn there is a > 0 such that i=1 |f (di ) f (ci )| P whenever c c1 d1 . . .
n
cn dn d and i=1 di ci ; andPthere is an > 0 such that i=1 |g(bi ) g(ai )| whenever
n
a a1 b1 P . . . an bn b and i=1 bi ai . Now suppose that a a1 b1 . . . an
n
bP
n b and i=1 bi ai Pn. Because g is non-decreasing, we have c g(a1 ) . . . g(bn ) d and
n
i=1 g(bi ) g(a i ) , so i=1 |f (g(bi )) f (g(ai ))| ; as is arbitrary, f g is absolutely continuous.

225D Lemma Let [a, b] be a non-empty closed interval in R and f : [a, b] R an absolutely continuous
function which has zero derivative almost everywhere on [a, b]. Then f is constant on [a, b].
225E Absolutely continuous functions 79

Pn
proof Let x [a, b], > 0. Let > Pn0 be such that i=1 |f (bi ) f (ai )| whenever a a1 b1
a2 b2 . . . an bn b and i=1 bi ai . Set A = {t : a < t < x, f 0 (t) exists = 0}; then
A = x a, writing for Lebesgue measure as usual. Let I be the set of non-empty non-singleton closed
intervals [c, d] [a, x] such that |f (d) f (c)| (d c); then every member of A belongs to arbitrarily
short members
S of I. By Vitalis theorem (221A), there is a countable disjoint family I0 I such that
(A \ I0 ) = 0, that is,
S P
x a = ( I0 ) = II0 I.
Now there is a finite I1 I0 such that
S P
( I1 ) = II1 I x a .
If I1 = , then x a + and |f (x) f (a)| . Otherwise, express I1 as {[c0 , d0 ], . . . , [cn , dn ]}, where
a c0 < d0 < c1 < d1 < . . . < cn < dn x. Then
Pn S
(c0 a) + i=1 (ci di1 ) + (x dn ) = ([a, x] \ I1 ) ,
so
Pn
|f (c0 ) f (a)| + i=1 |f (ci ) f (di1 )| + |f (x) f (dn )| .
On the other hand, |f (di ) f (ci )| (di ci ) for each i, so
Pn Pn
i=0 |f (di ) f (ci )| i=0 di ci (x a).

Putting these together,

|f (x) f (a)| |f (c0 ) f (a)| + |f (d0 ) f (c0 )| + |f (c1 ) f (d0 )| + . . .


+ |f (dn ) f (cn )| + |f (x) f (dn )|
n
X
= |f (c0 ) f (a)| + |f (ci ) f (di1 )|
i=1
n
X
+ |f (x) f (dn )| + |f (di ) f (ci )|
i=0
+ (x a) = (1 + x a).

As is arbitrary, f (x) = f (a). As x is arbitrary, f is constant.

225E Theorem Let [a, b] be a non-empty closed interval in R and F : [a, b] R a function. Then the
following are equiveridical: Rx
(i) there is an integrable real-valued function f such that F (x) = F (a) + a f for every x [a, b];
Rx 0
(ii) a F exists and is equal to F (x) F (a) for every x [a, b];
(iii) F is absolutely continuous.
R
proof (i)(iii) Assume (i). Let > 0. By 225A, there is a > 0 such that H |f | whenever H [a, b]
and H , writingPn for Lebesgue measure as usual. S Now suppose that a a1 b1 a2 b2 . . .
an bn b and i=1 bi ai . Consider H = 1in [ai , bi [. Then H and
Pn Pn R Pn R R
i=1 |F (bi ) F (ai )| = i=1 | [a ,b [ f |
i i
i=1 [a ,b [ |f | = F |f | .
i i

As is arbitrary, F is absolutely continuous.


Rb
(iii)(ii) If F is absolutely continuous, then it is of bounded variation (by 225Ba), so a F 0 exists (224I).
Rx
Set G(x) = a F 0 for x [a, b]; then G0 =a.e. F 0 (222E) and G is absolutely continuous (by (i)(iii) just
proved). Accordingly G F is absolutely continuous (225Bb) and is differentiable, with zero derivative,
almost everywhere. It follows that G F must be constant (225D). But as G(a) = 0, G = F + F (a); just
as required by (ii).
(ii)(i) is trivial.
80 The fundamental theorem of calculus 225F

225F Integration by parts As an application of this result, I give a justification of a familiar formula.
Theorem Let f be a real-valued function which is integrable over an interval [a, b] R, and g : [a, b] R xR
an absolutely continuous function. Suppose that F is an indefinite integral of f , so that F (x) F (a) = a f
for x [a, b]. Then
Rb Rb
a
f g = F (b)g(b) F (a)g(a) a
F g0 .

proof Set h = F g. Because F is absolutely continuous (225E), so is h (225Cd). Consequently h(b)h(a) =


Rb 0
a
h , by (iii)(ii) of 225E. But h0 = F 0 g + F g 0 wherever F 0 and g 0 are defined, which is almost
everywhere, and F 0 =a.e. f , by 222E; so h0 =a.e. f g + F g 0 . Finally, g and F are continuous, therefore
measurable, and bounded, while f and g 0 are integrable (using 225E yet again), so f g and F g 0 are
integrable, and
Rb Rb Rb
F (b)g(b) F (a)g(a) = h(b) h(a) = a
h0 = a
f g+ a
F g0 ,
as required.

225G I come now to a group of results at a rather deeper level than most of the work of this chapter,
being closer to the ideas of Chapter 26.
Proposition Let [a, b] be a non-empty closed interval in R and f : [a, b] R an absolutely continuous
function. Then f [A] is negligible for every negligible set A R.
Pn
proof Let > 0.PThen there is a > 0 such that i=1 |f (bi ) f (ai )| whenever a a1 b1 . . .
n
a
P n bn b and i=1 bi ai . Now there is a Ssequence hIk ikN of closed intervals, covering A, with

I k < . For each m N, let F m be [a, b] km Ik . Then f [Fm ] . PP Fm must be expressible
S
k=0
as in [ci , di ] where n m and a c0 d0 . . . cn dn b. For each i n choose xi , yi such that
ci xi , yi di and
f (xi ) = minx[ci ,di ] f (x), f (yi ) = maxx[ci ,di ] f (x);
such exist because f is continuous, so is bounded and attains its bounds on [ci , di ]. Set ai = min(xi , yi ),
bi = max(xi , yi ), so that ci ai bi di . Then
Pn Pn S
i=0 bi ai i=0 di ci = Fm ( kN Ik ) ,

so

[ n
X
f [Fm ] = ( f [ [ci , di ] ]) (f [ [ci , di ] ])
im i=0
n
X Xn
= [f (xi ), f (yi )] = |f (bi ) f (ai )| . Q
Q
i=0 i=0

But hf [Fm ]imN is a non-decreasing sequence covering f [A], so


S
f [A] ( mN f [Fm ]) = supmN f [Fm ] .
As is arbitrary, f [A] is negligible, as claimed.

225H Semi-continuous functions In preparation for the last main result of this section, I give a
general result concerning measurable real-valued functions on subsets of R. It will be convenient here, for
once, to consider functions taking values in [, ]. If D R r , a function g : D [, ] is lower
semi-continuous if {x : g(x) > u} is an open subset of D (for the subspace topology, see 2A3C) for every
u [, ]. Any lower semi-continuous function is Borel measurable, therefore Lebesgue measurable
(121B-121D). Now we have the following result.

225I Proposition Suppose that r 1 and that f is a real-valued function, defined on a subset D of R r ,
which is integrable over D. Then for any >
R 0 there is a lower semi-continuous function
R g : R r [, ]
such that g(x) f (x) for every x D and D g is defined and not greater than + D f .
225I Absolutely continuous functions 81

Remarks This is a result of great general importance, so I give it in a fairly general form; but for the
present chapter all we need is the case r = 1, D = [a, b] where a b.
R
proof (a) We can enumerate Q as hqn inN . By 225A, there is a > 0 such that F |f | 12 whenever
D F , where D is the subspace measure on D, so that D F = F , the outer Lebesgue measure of F ,
for every F D , the domain of D (214A-214B). For each n N, set

n = 2n1 min( , ),
1+2|qn |
P P
so that n=0 n |qn | 21 and n=0 n . For each n N, let En R r be a Lebesgue measurable
set such that {x : f (x) qn } = D En , and choose an open set Gn En B(0, n) such that Gn
(En B(0, n)) + n (134Fa), writing B(0, n) for the ball {x : kxk n}. For x R r , set
g(x) = sup{qn : x Gn },
allowing as sup and as the supremum of a set with no upper bound in R.
(b) Now check the properties of g.
(i) g is lower semi-continuous. P
P If u [, ], then
S
{x : g(x) > u} = {Gn : qn > u}
is a union of open sets, therefore open. Q
Q
(ii) g(x) f (x) for every x D. P
P If x D and > 0, there is an n N such that kxk n and
f (x) qn f (x); now x En Gn so g(x) qn f (x) . As is arbitrary, g(x) f (x). Q
Q
(iii) Consider the functions h1 , h2 : D ], ] defined by setting
[
h1 (x) = |f (x)| if x D (Gn \ En )
nN

= 0 for other x D,
X
h2 (x) = |qn |(Gn \ En )(x) for every x D.
n=0
S
Setting F = nN Gn \ En ,
P
F n=0 (Gn \ En ) ,
so
R R 1
D
h1 DF
|f |
2

by the choice of . As for h2 , we have (by B.Levis theorem)


R P P 1
h = n=0 |qn |D (D Gn \ Fn ) n=0 |qn |(Gn \ Fn )
D 2 2
R
because this is finite, h2 (x) < for almost every x D. Thus D h1 + h2 .
(iv) The point is that g f + h1 + h2 everywhere in D. P
P Take any x D. If n N and x Gn ,
then either x En , in which case
f (x) + h1 (x) + h2 (x) f (x) qn ,
or x Gn \ En , in which case
f (x) + h1 (x) + h2 (x) f (x) + |f (x)| + |qn | qn .
Thus
f (x) + h1 (x) + h2 (x) sup{qn : x Gn } g(x). Q
Q
So g f + h1 + h2 everywhere in D.
(v) Putting (iii) and (iv) together,
82 The fundamental theorem of calculus 225I
R R R
D
g D
f + h1 + h2 + D
f,
as required.

225J We need some results on Borel measurable sets and functions which are of independent interest.
Theorem Let D be a subset of R and f : D R any function. Then
E = {x : x D, f is continuous at x}
is relatively Borel measurable in D, and
F = {x : x D, f is differentiable at x}
is actually Borel measurable; moreover, f 0 : F R is Borel measurable.
proof (a) For k N set
Gk = {]a, b[ : a, b R, |f (x) f (y)| 2k for all x, y D ]a, b[}.
S T
Then Gk = Gk is an open set, so E0 = kN Gk is a Borel set. But E = D E0 , so E is a relatively Borel
subset of D.
(b)(i) I should perhaps say at once that when interpreting the formula f 0 (x) = limh0 (f (x+h)f (x))/h,
I insist on the restrictive definition
f (x+h)f (x)
a = limh0 h
if
f (x+h)f (x)
for every > 0 there is a > 0 such that is defined and
h
f (x+h)f (x)
| a| whenever 0 < |h| .
h
0
So f (x) can be defined only if there is some > 0 such that the whole interval [x , x + ] lies within the
domain D of f .
(ii) For p, q, q 0 Q and k N set

H(k, p, q, q 0 ) = if ]q, q 0 [ 6 D,
= {x : x E ]q, q 0 [ , |f (y) f (x) p(y x)| 2k for every y ]q, q 0 [}
if ]q, q 0 [ D.

Then H(k, p, q, q 0 ) = E ]q, q 0 [ H(k, p, q, q 0 ). P


P If x E ]q, q 0 [ H(k, p, q, q 0 ) there is a sequence hxn inN
0
in H(k, p, q, q ) converging to x. Because f is continuous at x,
|f (y) f (x) p(y x)| = limn |f (y) f (xn ) p(y xn )| 2k
for every y ]q, q 0 [, so that x H(k, p, q, q 0 ). Q
Q Since E is a Borel set, by (a), so is H(k, p, q, q 0 ).
(iii) Now
T S
F = kN p,q,q 0 Q H(k, p, q, q 0 ).
P () Suppose x F , that is, f 0 (x) is defined; say f 0 (x) = a. Take any k N. Then there are p Q,
P
]0, 1] such that |p a| 2k1 and [x , x + ] D and | f (x+h)f h
(x)
a| 2k1 whenever
0 0
0 < |h| ; nowT takeS q Q [x , x[, q Q ]x,Tx + ]S and see that x 0 H(k, p, q, q ). As x is
0
arbitrary, F kN p,q,q0 Q H(k, p, q, q ). () If x kN p,q,q0 Q H(k, p, q, q ), then for each k N
choose pk , qk , qk0 Q such that x H(k, pk , qk , qk0 ). If h 6= 0, x + h ]qk , qk0 [ then | f (x+h)f
h
(x)
pk | 2k .
But this means, first, that |pk pl | 2k + 2l for every k, l (since surely there is some h 6= 0 such
that x + h ]qk , qk0 [ ]ql , ql0 [), so that hpk ikN is a Cauchy sequence, with limit a say; and, second, that
| f (x+h)f
h
(x)
a| 2k + |a pk | whenever h 6= 0 and x + h ]qk , qk0 [, so that f 0 (x) is defined and equal to
a. Q Q
S
(iv) Because Q is countable, all the unions p,q,q0 Q H(k, p, q, q 0 ) are Borel sets, so F also is.
225K Absolutely continuous functions 83

S
(v) Now enumerate Q 3 as h(pi , qi , qi0 )iiN , and set Hki
0
= H(k, pi , qi , qi0 ) \ j<i H(k, pj , qj , qj0 ) for each
0 0
k, i N. Every Hki is Borel measurable, hHki iiN is disjoint, and
S 0
S 0
iN Hki = iN H(k, pi , qi , qi ) F

for each k. Note that |f 0 (x) p| 2k whenever x F H(k, p, q, q 0 ), so if we set fk (x) = pi for every
0
x Hki we shall have a Borel function fk such that |f (x) fk (x)| 2k for every x F . Accordingly
0
f = limk fk F is Borel measurable.

225K Proposition Let [a, b] be a non-empty closed interval in R, and f : [a, b] R a function. Set
F = {x : x ]a, b[ , f 0 (x) is defined}. Then f is absolutely continuous iff (i) it is continuous (ii) f 0 is
integrable over F (iii) f [ [a, b] \ F ] is negligible.
proof (a) Suppose first that f is absolutely continuous. Then f is surely continuous (225Ca) and f 0 is
integrable over [a, b], therefore over F (225E); also [a, b] \ F is negligible, so f [ [a, b] \ F ] is negligible, by
225G.
(b) So now suppose that f satisfies the conditions. Set f (x) = |f 0 (x)| for x F , 0 for x [a, b] \ F .
Rb
Then f (b) f (a) + a f .
PP (i) Because F is a Borel set and f 0 is a Borel measurable function (225J), f is measurable. Let > 0.
Let G be an open subset of R such that f [ [a, b] \ F ] G and G (134Fa). Let g : R [0, ] be a lower
Rb Rb
semi-continuous function such that f (x) g(x) for every x [a, b] and a g a f + (225I). Consider
Rx
A = {x : a x b, ([f (a), f (x)] \ G) 2(x a) + a
g},
interpreting [f (a), f (x)] as if f (x) < f (a). Then a A [a, b], so c = sup A is defined and belongs to
[a, b]. Rx
Because f is continuous, the function x 7 ([f (a), f (x)] \ G) is continuous; also x 7 2(x a) + a g is
certainly continuous, so c A.
(ii) ?? If c F , so that f (c) = |f 0 (c)|, then there is a > 0 such that
a c c + b,

g(x) g(c) |f 0 (c)| whenever |x c| ,

f (x)f (c)
| f 0 (c)| whenever |x c| .
xc
Consider x = c + . Then c < x b and

([f (a), f (x)] \ G) ([f (a), f (c)] \ G) + |f (x) f (c)|


Z c
2(c a) + g + (x c) + |f 0 (c)|(x c)
a
Z c Z x
2(c a) + g + (x c) + (g + )
a c
(because g(t) |f 0 (c)| whenever c t x)
Z x
= 2(x a) + g,
a

so x A; but c is supposed to be an upper bound of A. X


X
Thus c [a, b] \ F .
(iii) ?? Now suppose, if possible, that c < b. We know that f (c) G, so there is an > 0 such that
[f (c) , f (c) + ] G; now there is a > 0 such that |f (x) f (c)| whenever x [a, b] and |x c| .
Set x = min(c + , b); then c < x b and [f (c), f (x)] G, so
Rc Rx
([f (a), f (x)] \ G) = ([f (a), f (c)] \ G) 2(c a) + a g 2(x a) + a g
84 The fundamental theorem of calculus 225K

and once again x A, even though x > sup A. X


X
(iv) We conclude that c = b, so that b A. But this means that

f (b) f (a) ([f (a), f (b)]) ([f (a), f (b)] \ G) + G


Z b Z b
2(b a) + g + 2(b a) + f + +
a a
Z b
= 2(1 + b a) + f .
a
Rb
As is arbitrary, f (b) f (a) a
f , as claimed. Q
Q
Rb Rb
(c) Similarly, or applying (b) to f , f (a) f (b) a f , so that |f (b) f (a)| a f .
Rd
Of course the argument applies equally to any subinterval of [a, b], so |f (d) R f (c)| c f whenever
a c d b. Now let > 0. By 225A once more, there is a > 0 such that E f whenever E [a, b]
Pn
and E . Suppose that a a1 b1 . . . an bn b and i=1 bi ai . Then
Pn Pn R bi R
i=1 |f (bi ) f (ai )| i=1 a f =
S
[a ,b ]
f .
i in i i

So f is absolutely continuous, as claimed.

225L Corollary Let [a, b] be a non-empty closed interval in R. Let f : [a, b] R be a continuous
function which is differentiable on the open interval ]a, b[. If its derivative f 0 is integrable over [a, b], then f
Rb
is absolutely continuous, and f (b) f (a) = a f 0 .
proof f [ [a, b] \ F ] = {f (a), f (b)} is surely negligible, so f is absolutely continuous, by 225K; consequently
Rb
f (b) f (a) = a f 0 , by 225E.

225M Corollary Let [a, b] be a non-empty closed interval in R, and f : [a, b] R a continuous function.
Then f is absolutely continuous iff it is continuous and of bounded variation and f [A] is negligible for every
negligible A [a, b].
proof (a) Suppose that f is absolutely continuous. By 225C(a-b) it is continuous and of bounded variation,
and by 225G we have f [A] negligible for every negligible A [a, b].
(b) So now suppose that f satisfies the conditions. Set F = {x : x ]a, b[ , f 0 (x) is defined}. By 224I,
[a, b] \ F is negligible, so f [ [a, b] \ F ] is negligible. Moreover, also by 224I, f 0 is integrable over [a, b] or F .
So the conditions of 225K are satisfied and f is absolutely continuous.

225N The Cantor function I should mention the standard example of a continuous function of
bounded variation which is not absolutely continuous. Let C [0, 1] be the Cantor set (134G). Recall
that the Cantor function is a non-decreasing continuous function f : [0, 1] [0, 1] such that f 0 (x) is
defined and equal to zero for every x [0, 1] \ C, but f (0) = 0 < 1 = f (1) (134H). Of course f is of
bounded variation and not absolutely continuous. C is negligible and f [C] = [0, 1] is not. If x C, then for
every n N there is an interval of length 3n , containing x, on which f increases by 2n ; so f cannot be
differentiable at x, and the set F = dom f 0 of 225K is precisely [0, 1] \ C, so that f [ [0, 1] \ F ] = [0, 1].

225O Complex-valued functions As usual, I spell out the results above in the forms applicable to
complex-valued functions.
(a) Let (X, , ) be any measure space and f an integrable complex-valued function defined on a coneg-
ligible subset of RX. Then for any > 0 there are a measurable set E of finite measure and a real number
> 0 such that F |f | whenever F and (F E) . (Apply 225A to |f |.)
(b) If [a, b] is a non-empty closed interval in R and f : [a, b] C
Pnis a function, we say that f is
absolutely continuous if for every > 0 therePis a > 0 such that i=1 |f (bi ) f (ai )| whenever
n
a a1 b1 a2 b2 . . . an bn b and i=1 bi ai . Observe that f is absolutely continuous
iff its real and imaginary parts are both absolutely continuous.
225Xg Absolutely continuous functions 85

(c) Let [a, b] be a non-empty closed interval in R.


(i) If f : [a, b] C is absolutely continuous it is of bounded variation on [a, b], so is differentiable
almost everywhere on [a, b], and its derivative is integrable over [a, b].
(ii) If f , g : [a, b] C are absolutely continuous, so are f + g and f , for any C, and f g.
(iii) If g : [a, b] [c, d] is monotonic and absolutely continuous, and f : [c, d] C is absolutely
continuous, then f g : [a, b] C is absolutely continuous.

(d) Let [a, b] be a non-empty closed interval in R and F : [a, b] C a function. Then the following are
equiveridical: Rx
(i) there is an integrable complex-valued function f such that F (x) = F (a) + a f for every x [a, b];
Rx 0
(ii) a F exists and is equal to F (x) F (a) for every x [a, b];
(iii) F is absolutely continuous.
(Apply 225E to the real and imaginary parts of F .)

(e) Let f be an integrable complex-valuedR x function on an interval [a, b] R, and g : [a, b] C an


absolutely continuous function. Set F (x) = a f for x [a, b]. Then
Rb Rb
a
f g = F (b)g(b) F (a)g(a) a
F g0 .
(Apply 225F to the real and imaginary parts of f and g.)

(f ) Let f be a continuous complex-valued function on a closed interval [a, b] R, and suppose that f is
differentiable at every point of the open interval ]a, b[, with f 0 integrable over [a, b]. Then f is absolutely
continuous. (Apply 225L to the real and imaginary parts of f .)

(g) For a result corresponding to 225M, see 264Yp.

225X Basic exercises (a) Show directly from the definition in 225B (without appealing to 225E) that
any absolutely continuous real-valued function on a closed interval [a, b] is expressible as the difference of
non-decreasing absolutely continuous functions.

(b) Let f : [a, b] R be an absolutely continuous function, where a b. (i) Show that |f | : [a, b] R
is absolutely continuous. (ii) Show that gf is absolutely continuous whenever g : R R is a differentiable
function with bounded derivative.

(c) Show directly from the definition in 225B and the Mean Value Theorem (without appealing to 225K)
that if a function f is continuous on a closed interval [a, b], differentiable on the open
R x interval ]a, b[, and has
bounded derivative in ]a, b[, then f is absolutely continuous, so that f (x) = f (a) + a f 0 for every x [a, b].
Rb
(d) Show that if f : [a, b] R is absolutely continuous, then Var[a,b] (f ) = a
|f 0 |. (Hint: put 224I and
225E together.)

(e) Let f : [0, [ C be


R a function which is absolutely continuous on [0, a] for every a [0, [ and has
Laplace transform F (s) = 0 esx f (x)dx defined on {s : Re s > S}. Suppose also that limx eSx f (x) =
0. Show that f 0 has Laplace transform sF (s) f (0) defined whenever Re s > S. (Hint: show that
Rx d
f (x)esx f (0) = 0 dt
(f (t)est )dt

for every x 0.)

(f ) Let g : R R be a non-decreasing function which is absolutely continuous on every bounded


R interval;
let g be the associated Lebesgue-Stieltjes measure (114Xa), and g its domain. Show that E g 0 = g E for
any E g , if we allow as a value of the integral. (Hint: start with intervals E.)

(g) Let g : [a, b] R be a non-decreasing absolutely continuous function, and f : [g(a), g(b)] R a
R g(b) Rb Rx
continuous function. Show that g(a) f (t)dt = a f (g(t))g 0 (t)dt. (Hint: set F (x) = g(a) f , G = F g and
Rb
consider a G0 (t)dt. See also 263I.)
86 The fundamental theorem of calculus 225Xh

(h) Suppose that I R is any non-trivial interval (bounded or unbounded, open, closed or half-open, but
not empty or a singleton), and f : I R a function. Show that f is absolutely continuous on every closed
Rb
bounded subinterval of I iff there is a function g such that a g = f (b) f (a) whenever a b in I, and in
this case g is integrable iff f is of bounded variation on I.
R1ln x P 1 R1 1
P
(i) Show that 0 x1
dx = n=1 n2 . (Hint: use 225F to find 0
xn ln x dx, and recall that 1x = n=0 xn
for 0 x < 1.)
R1 R
(j) (i) Show that 0 ta dt is finite for every a > 1. (ii) Show that 1 ta et dt is finite for every a R.
R
(Hint: show that there is an M such that ta M et/2 for t M .) (iii) Show that (a) = 0 ta1 et dt is
defined for every a > 0. (iv) Show that (a + 1) = a(a) for every a > 0. (v) Show that (n + 1) = n! for
every n N.
( is of course the -function.)
R 2
(k) Show that if b > 0 then 0
ub1 eu /2
du = 2(b2)/2 ( 2b ). (Hint: consider f (t) = t(b2)/2 et ,
g(u) = u2 /2 in 225Xg.)

(l) Suppose that f , g are lower semi-continuous functions, defined on subsets of R r , and taking values in
], ]. (i) Show that f +g, f g and f g are lower semi-continuous, and that f is lower semi-continuous
for every 0. (ii) Show that if f and g are non-negative, then f g is lower semi-continuous. (iii) Show
that if f is non-negative and g is continuous, then f g is lower semi-continuous. (iv) Show that if f is
non-decreasing then the composition f g is lower semi-continuous.

(m) Let A be a non-empty family of lower semi-continuous functions definedSon subsets of R r and taking
values in [, ]. Set g(x) = sup{f (x) : f A, x dom f } for x D = f A dom f . Show that g is
lower semi-continuous.

(n) Suppose that f : [a, b] R is continuous, and differentiable at all but countably many points of [a, b].
Show that f is absolutely continuous iff it is of bounded variation.

(o) Show that if f : [a, b] R is absolutely continuous, then f [E] is Lebesgue measurable for every
Lebesgue measurable set E [a, b].

225Y Further exercises (a) Show that the composition of two absolutely continuous functions need
not be absolutely continuous. (Hint: 224Xb.)

(b) Let f : [a, b] R be a continuous function, where a < b. Set G = {x : x ]a, b[ , y ]x, b] such that
f (x) < f (y)}. Show that G is open and is expressible as a disjoint union of intervals ]c, d[ where f (c) f (d).
Use this to prove 225D without calling on Vitalis theorem.

(c) Let f : [a, b] R be a function of bounded variation and > 0. Show that there is an absolutely
continuous function g : [a, b] R such that |g 0 (x)| wherever the derivative is defined and {x : x
[a, b], f (x) 6= g(x)} has measure at most 1 Var[a,b] (f ). (Hint: reduce to the case of non-decreasing f .
Apply 225Yb to the function x 7 f (x) x and show that G Var[a,b] (f ). Set g(x) = f (x) for
x ]a, b[ \ G.)

(d) Let f be a non-negative measurable real-valued function defined on a subset D of R r , where r 1.


r
Show that for any >
R 0 there is a lower semi-continuous function g : R [, ] such that g(x) f (x)
for every x D and D g f .

(e) Let f be a measurable real-valued function defined on a subset D of R r , where r 1. Show that
for any > 0 there is a lower semi-continuous function g : R r [, ] such that g(x) f (x) for every
x D and {x : x D, g(x) > f (x)} . (Hint: 134Yd, 134Fb.)
226Ab The Lebesgue decomposition of a function of bounded variation 87

(f ) Let f be a real-valued function defined on a set D R. For x D, set


(D f )(x) = inf{u : u [, ], > 0, f (y) f (x) + u(y x)
whenever y D and x y x + }.
Show that D f : D [, ] is Lebesgue measurable, in the sense that {x : (D f )(x) u} is relatively
measurable in D for every u [, ], if f is Lebesgue measurable, and is Borel measurable if f is Borel
measurable.

225 Notes and comments There is a good deal more to say about absolutely continuous functions; I
will return to the topic in the next section and in Chapter 26. I shall not make direct use of any of the
results from 225H on, but it seems to me that this kind of investigation is necessary for any clear picture of
the relationships between such concepts as absolute continuity and bounded variation. Of course, in order
to apply these results, we do need a store of simple kinds of absolutely continuous function, differentiable
functions with bounded derivative forming the most important class (225Xc). A larger family of the same
kind is the class of Lipschitz functions (262Bc).
The definition of absolutely continuous function is ordinarily set out for closed bounded intervals, as in
225B. The point is that for other intervals the simplest generalizations of this formulation do not seem quite
appropriate. In 225Xh I try to suggest the kind of demands one might make on functions defined on other
types of interval.
I should remark that the real prize is still not quite within our grasp. I have been able to give a
reasonably satisfactory formulation of simple integration by parts (225F), at least for bounded intervals
a further limiting process is necessary to deal with unbounded intervals. But a companion method from
advanced calculus, integration by substitution, remains elusive. The best I think we can do at this point is
225Xg, which insists on a continuous integrand f . It is the case that the result is valid for general integrable
f , but there are some further subtleties to be mastered on the way; the necessary ideas are given in the
much more general results 235A and 263D below, and applied to the one-dimensional case in 263I.
On the way to the characterization of absolutely continuous functions in 225K, I find myself calling on one
of the fundamental relationships between Lebesgue measure and the topology of R r (225I). The technique
here can be adapted to give many variations of the result; see 225Yd-225Ye. If you have not seen semi-
continuous functions before, 225Xl-225Xm give a partial idea of their properties. In 225J I give a fragment
of descriptive set theory, the study of the kinds of set which can arise from the formulae of analysis. These
ideas too will re-surface elsewhere; compare 225Yf and also the proof of 262M below.

226 The Lebesgue decomposition of a function of bounded variation


I end this chapter with some notes on a method of analysing a general function of bounded variation
which may help to give a picture of what such functions can be, though it is not directly necessary for
anything of great importance dealt with in this volume.

226A Sums over arbitrary index sets To get a full picture of this fragment of real analysis, a bit
of preparation will be helpful. This concerns the notion of a sum over an arbitrary index set, which I have
rather been skirting around so far.

(a) If I is any set and hai iiI any family in [0, ], we set
P P
iI ai = sup{ iK ai : K is a finite subset of I},
P
with the convention that i ai = 0. (See 112Bd, 222Ba.) For general ai [, ], we can set
P P + P
iI ai = iI ai iI ai
P P
if this is defined in [, ], thatP is, at least one of iI a+ i ,
+
iI ai is finite, where a = max(a, 0) and
a = max(a, 0) for each a. If iI ai is defined and finite, we say that hai iiI is summable.
88 The fundamental theorem of calculus 226Ab

(b) Since this is a book on measure theory, I will immediately describe the relationship between this
kind of summability and an appropriate notion of integration. For any set I, we have the corresponding
counting measure on I (112Bd). Every subset of I is measurable, so every family hai iiI of real numbers
is a measurable real-valued function on I. A subset of I has finite measure iff it is finite; so a real-valued
function f on I is simple if K = {i : f (i) 6= 0} is finite. In this case,
R P P
f d = iK f (i) = iI f (i)
as
R defined in part
R (a). The measure is semi-finite (211Nc) so a non-negative function f is integrable iff
f = supK< K f is finite (213B); but of course this supremum is precisely
P P
sup{ iK f (i) : K I is finite} = iI f (i).
R P
Now a general function f : I R is integrable iff it is measurable and |f |d < , that is, iff iI |f (i)| <
, and in this case
R R R P P P
f d = f + d f d = iI f (i)+ iI f (i) = iI f (i),
writing f (i) = f (i) for each i. Thus we have
P R
iI ai = I
ai (di),
and the standard rules under which we allow as the value of an integral (133A, 135F) match well with
the interpretations in (a) above.

(c) Accordingly, and unsurprisingly, the operation of summation is a linear operation on the linear space
of summable families of real numbers.
I observe here that this notion of summability is absolute; a family hai iiI is summable iff it is absolutely
summable. This is necessary because it must also be unconditional; we have no structure on an arbitrary
set
P I to guide us to take the sum in any particular order. See 226Xf. In particular, P I distinguish between
nN an , which in this book will always be interpreted as in 226A above, and n=0 an which (if it makes a
Pm P 1n P 1n
difference) should be interpreted as limm n=0 an . So, for instance, n=0 = ln 2, while nN
n+1 n+1
P P
is undefined. Of course n=0 an = nN an whenever the latter is defined in [, ].

(d) There is another, and very important, approach to the sum described here. If hai iiI
Pis an (absolutely)
summable family of real numbers, then for every > 0 there is a finite K I such that iI\K |ai | . PP
R
This is nothing but a special case of 225A; there is a set K with K < such that I\K |ai |(di) , but
R P
I\K
|ai |(di) = iI\K |ai |. Q
Q

(Of course there are direct proofs of this result from the definition in (a), not mentioning measures or
integrals. But I think you will see that they rely on the same idea as that in the proof of 225A.) Consequently,
for any family hai iiI of real numbers and any s R, the following are equiveridical:
P
(i) iI ai = s;
P
(ii) for every > 0 there is a finite K I such that |s iJ ai | whenever J is finite and
K J I.
P
PP (i)(ii) Take K such that iI\K |ai | . If K J I, then
P P P
|s iJ ai | = | iI\J ai | iI\K |ai | .
(ii)(i) Let > 0, and let K I be as described in (ii). If J I \ K is any finite set, then set
J1 = {i : i J, ai 0}, J2 = J \ J1 . We have
X X X
|ai | = | ai ai |
iJ iJ1 K iJ2 K
X X
|s ai | + |s ai | 2.
iJ1 K iJ2 K
P
As J is arbitrary, iI\K |ai | 2 and
226Bb The Lebesgue decomposition of a function of bounded variation 89

P P
iI |ai | iK |ai | + 2 < .
P
Accordingly iIai is well-defined in R. Also
P P P P
|s iI ai | |s iK ai | + | iI\K ai | + iI\K |ai | 3.
P
As is arbitrary, iI ai = Ps, as required. QQ
In this way, we express iI ai directly as a limit; we could write it as
P P
iI ai = limKI iK ai ,
on the understanding that we look at finite sets K in the right-hand formula.
P
(e) Yet another approach is through the following fact. If iI |ai | < , then for any > 0 the set
P
{i : |ai | } is finite, indeed can have at most 1 iI |ai | members; consequently
S
J = {i : ai 6= 0} = nN {i : |ai | 2n }
P P
is countable (1A1F). If J is finite, then of course iI ai = iJ ai reduces to a finite sum. Otherwise, we
can enumerate J as hjn inN , and we shall have
P P Pn P
iI ai = iJ ai = limn k=0 ajk = n=0 ajn
P
(using (d) to reduce the sum iJ ai to a limit of finite sums). Conversely, if hai iiI is such that there is
P P
a countably infinite J {i : ai 6= 0} enumerated as hjn inN , and if n=0 |ajn | < , then iI ai will be
P
n=0 ajn .

226B Saltus functions Now we are ready for a special type of function of bounded variation on R.
Suppose that a < b in R.
(a) A (real) saltus function on [a, b] is a function F : [a, b] R expressible in the form
P P
F (x) = t[a,x[ ut + t[a,x] vt
P P
for x [a, b], where hut it[a,b[ , hvt it[a,b] are real-valued families such that t[a,b[ |ut | and t[a,b] |vt | are
finite.
(b) For any function F : [a, b] R we can write
F (x+ ) = limyx F (y) if x [a, b[ and the limit exists,

F (x ) = limyx F (y) if x ]a, b] and the limit exists.


(I hope that this will not lead to confusion with the alternative interpretation of x+ as max(x, 0).) Observe
that if F is a saltus function, as defined in (b), with associated families hut it[a,b[ and hvt it[a,b] , then
va = F (a), vx = F (x) F (x ) for x ]a, b], ux = F (x+ ) F (x) for x [a, b[. P
P Let > 0. As remarked
in 226Ad, there is a finite K [a, b] such that
P P
t[a,b[\K |ut | + t[a,b]\K |vt | .

Given x [a, b], let > 0 be such that [x , x + ] contains no point of K except perhaps x. In this case,
if max(a, x ) y < x, we must have
X X
|F (y) (F (x) vx )| = | ut + vt |
t[y,x[ t]y,x[
X X
|ut | + |vt | ,
t[a,b[\K t[a,b]\K

while if x < y min(b, x + ) we shall have


X X
|F (y) (F (x) + ux )| = | ut + vt |
t]x,y[ t]x,y]
X X
|ut | + |vt | .
t[a,b[\K t[a,b]\K
90 The fundamental theorem of calculus 226Bb

As is arbitrary, we get F (x ) = F (x) vx (if x > a) and F (x+ ) = F (x) + ux (if x < b). QQ
It follows that F is continuous at x ]a, b[ iff ux = vx = 0, while F is continuous at a iff ua = 0 and F is
continuous at b iff vb = 0. In particular, {x : x [a, b], F is not continuous at x} is countable (see 226Ae).

(c) If F is a saltus function defined on [a, b], with associated families hut it[a,b[ , hvt it[a,b] , then F is of
bounded variation on [a, b], and
P P
Var[a,b] (F ) t[a,b[ |ut | + t]a,b] |vt |.
P
P If a x < y b, then
P
F (y) F (x) = ux + t]x,y[ (ut + vt ) + vy ,
so
P P
|F (y) F (x)| t[x,y[ |ut | + t]x,y] |vt |.
If a a0 a1 . . . an b, then
n
X n
X X X

|F (ai ) F (ai1 )| |ut | + |vt |
i=1 i=1 t[ai1 ,ai [ t]ai1 ,ai ]
X X
|ut | + |vt |.
t[a,b[ t]a,b]

Consequently
P P
Var[a,b] (F ) t[a,b[ |ut | + t]a,b] |vt | < . Q
Q

(d) The inequality in (c) is actually an equality. To see this, note first that if a x < y b, then
Var[x,y] (F ) |ux | + |vy |. P
P I noted in (b) that ux = limtx F (t) F (x) and vy = F (y) limty F (t). So,
given > 0, we can find t1 , t2 such that x < t1 t2 y and
|F (t1 ) F (x)| |ux | , |F (y) F (t2 )| |vy | .
Now
Var[x,y] (F ) |F (t1 ) F (x)| + |F (t2 ) F (t1 )| + |F (y) F (t2 )| |ux | + |vy | 2.
As is arbitrary, we have the result. Q
Q
Now, given a t0 < t1 < . . . < tn b, we must have

n
X
Var(F ) Var (F )
[a,b] [ti1 ,ti ]
i=1
(using 224Cc)
n
X
|uti1 | + |vti |.
i=1

As t0 , . . . , tn are arbitrary,
P P
Var[a,b] (F ) t[a,b[ |ut | + t]a,b] |vt |,
as required.

(e) Because a saltus function is of bounded variation ((c) above), it is differentiable almost everywhere
(224I). In fact its derivative is zero almost everywhere. P P Let F : [a, b] R be a saltus function, with
associated families hut it[a,b[ , hvt it[a,b] . Let > 0. Let K [a, b] be a finite set such that
P P
t[a,b[\K |ut | + t[a,b]\K |vt | .

Set
226Ca The Lebesgue decomposition of a function of bounded variation 91

u0t = ut if t [a, b[ K,
= 0 if t [a, b[ \ K,
vt0 = vt if t K,
= 0 if t [a, b] \ K,
u00t = ut u0t for t [a, b[ ,
vt00 = vt vt0 for t [a, b].
Let G, H be the saltus functions corresponding to hu0t it[a,b[ , hvt0 it[a,b] and hu00t it[a,b[ hvt00 it[a,b] , so that
F = G + H. Then G0 (t) = 0 for every t ]a, b[ \ K, since ]a, b[ \ K comprises a finite number of open
intervals on each of which G is constant. So G0 =a.e. 0 and F 0 =a.e. H 0 . On the other hand,
Rb 0 P P
a
|H | Var[a,b] (H) = t[a,b[\K |ut | + t]a,b]\K |vt | ,
using 224I and (d) above. So
Rb Rb
a
|F 0 | = a
|H 0 | .
Rb
As is arbitrary, a
|F 0 | = 0 and F 0 =a.e. 0, as claimed. Q
Q

226C The Lebesgue decomposition of a function of bounded variation Take a, b R with


a < b.

(a) If F : [a, b] R is non-decreasing, set va = 0, vt = F (t) F (t ) for t ]a, b], ut = F (t+ ) F (t)
for t [a, b[, defining F (t+ ), F (t ) as in 226Bb. Then all the vt , ut are non-negative, and if a < t0 < t1 <
. . . < tn < b, then
Pn Pn +
i=0 (uti + vti ) = i=0 (F (ti ) F (ti )) F (b) F (a).
P P
Accordingly t[a,b[ ut and t[a,b] vt are both finite. Let Fp be the corresponding saltus function, as
defined in 226Ba, so that
P
Fp (x) = F (a+ ) F (a) + t]a,x[ (F (t+ ) F (t )) + F (x) F (x )
if a < x b. If a x < y b then
X
Fp (y) Fp (x) = F (x+ ) F (x) + (F (t+ ) F (t )) + F (y) F (y )
t]x,y[

F (y) F (x)
because if x = t0 < t1 < . . . < tn < tn+1 = y then
n
X
F (x+ ) F (x) + (F (t+
i ) F (ti )) + F (y) F (y )
i=1
n+1
X
= F (y) F (x) (F (t +
i ) F (ti1 )) F (y) F (x).
i=1

Accordingly both Fp and Fc = F Fp are non-decreasing. Also, because


Fp (a) = 0 = va ,

Fp (t) Fp (t ) = vt = F (t) F (t ) for t ]a, b],

Fp (t+ ) Fp (t) = ut = F (t+ ) F (t) for t [a, b[,


we shall have
Fc (a) = F (a),
92 The fundamental theorem of calculus 226Ca

Fc (t) = Fc (t ) for t ]a, b],

Fc (t) = Fc (t+ ) for t [a, b[,


and Fc is continuous.
Clearly this expression of F = Fp +Fc as the sum of a saltus function and a continuous function is unique,
except that we can freely add a constant to one if we subtract it from the other.

(b) Still taking F : [a, b] R to be non-decreasing,


Rx 0 we know that F 0 is integrable (222C); moreover,
0 0
F =a.e. Fc , by 226Be. Set Fac (x) = F (a) + a F for each x [a, b]. We have
Ry
Fac (y) Fac (x) = x Fc0 Fc (y) Fc (x)
for a x y b (222C again), so Fcs = Fc Fac is still non-decreasing; Fac is continuous (225A), so Fcs
0
is continuous; Fac =a.e. F 0 =a.e. Fc0 (222E), so Fcs
0
=a.e. 0.
Again, the expression of Fc = Fac + Fcs as the sum of an absolutely continuous function and a function
with zero derivative almost everywhere is unique, except for the possibility of moving a constant from one
to the other, because two absolutely continuous functions whose derivatives are equal almost everywhere
must differ by a constant (225D).

(c) Putting all these together: if F : [a, b] R is any non-decreasing function, it is expressible as
Fp + Fac + Fcs , where Fp is a saltus function, Fac is absolutely continuous, and Fcs is continuous and
differentiable, with zero derivative, almost everywhere; all three components are non-decreasing; and the
expression is unique if we say that Fac (a) = F (a), Fp (a) = Fcs (a) = 0.
The Cantor function f : [0, 1] [0, 1] (134H) is continuous and f 0 =a.e. 0 (134Hb), so fp = fac = 0
and f = fcs . Setting g(x) = 21 (x + f (x)) for x [0, 1], as in 134I, we get gp (x) = 0, gac (x) = x2 and
gcs (x) = 21 f (x).

(d) Now suppose that F : [a, b] R is of bounded variation. Then it is expressible as a difference
G H of non-decreasing functions (224D). So writing Fp = Gp Hp , etc., we can express F as a sum
0
Fp + Fcs + Fac , where Fp is a saltus function, Fac is absolutely continuous, Fcs is continuous, Fcs =a.e. 0,
Fac (a) = F (a), Fcs (a) = Fp (a) = 0. Under these conditions the expression is unique, because (for instance)
Fp (t+ ) Fp (t) = F (t+ ) F (t) for t [a, b[, while Fac
0
=a.e. (F Fp )0 =a.e. F 0 .
This is a Lebesgue decomposition of the function F . (I have to say a Lebesgue decomposition because
of course the assignment Fac (a) = F (a), Fp (a) = Fcs (a) = 0 is arbitrary.)

226D Complex functions The modifications needed to deal with complex functions are elementary.
(a) If I is any set and haj ijI is a family of complex numbers, then the following are equiveridical:
P
(i) jI |aj | < ;
P
(ii) there is an s C such that for every > 0 there is a finite K I such that |s jJ aj |
whenever J is finite and K J I.
In this case
P P R
s = jI Re(aj ) + i jI Im(aj ) = I aj (dj),
P
where is counting measure on I, and we write s = jI aj .

(b) If a < b in R, a complex saltus function on [a, b] is a function F : [a, b] C expressible in the form
P P
F (x) = t[a,x[ ut + t[a,x] vt
P P
for x [a, b], where hut it[a,b[ , hvt it[a,b] are complex-valued families such that t[a,b[ |ut | and t[a,b] |vt |
are finite; that is, if the real and imaginary parts of F are saltus functions. In this case F is continuous
except at countably many points and differentiable, with zero derivative, almost everywhere on [a, b], and
ux = limtx F (t) F (x) for every x [a, b[,

vx = limtx F (x) F (t) for every x ]a, b]


226Y The Lebesgue decomposition of a function of bounded variation 93

(apply the results of 226B to the real and imaginary parts of F ). F is of bounded variation, and its variation
is
P P
Var[a,b] (F ) = t[a,b[ |ut | + t]a,b] |vt |
(repeat the arguments of 226Bc-d).

(c) If F : [a, b] C is a function of bounded variation, where a < b in R, it is uniquely expressible as


F = Fp + Fcs + Fac , where Fp is a saltus function, Fac is absolutely continuous, Fcs is continuous and has
zero derivative almost everywhere, and Fac (a) = F (a), Fp (a) = Fcs (a) = 0. (Apply 226C to the real and
imaginary parts of F .)

226E As an elementary exercise in the language of 226A, I interpolate a version of a theorem of B.Levi
which is sometimes useful.
Proposition Let (X, , ) be a measurePspace,
R I a countable set, and hfi iiI P
a family of -integrable real-
or complex-valued
R functions such that |f i |d is finite. Then f (x) = iI fi (x) is defined almost
P R iI
everywhere and f d = iI fi d.
proof If I is finite this is elementary.POtherwise, since there must be a bijection between I and N, we
n
may take it that I = N. SettingR gn =P i=0 R |fi | for each n, we have a non-decreasing sequence hgn inN of
integrable functions such that gn iN |fi | for every n, so that g = supnN gn is integrable, by B.Levis
theorem as stated inP123A. In particular, g is finite almost everywhere.
P Now if x P X is such that g(x) is
defined and finite, iJ |fi (x)| g(x) for every finite J N, so iN |fi (x)| and iN fi (x) are defined.
P Pn Pn
In this case, of course, iN fi (x) = limn i=0 fi (x). But | i=0 fi | a.e. g for each n, so Lebesgues
Dominated Convergence Theorem tells us that
RP R Pn Pn R P
iN fi = limn i=0 fi = limn i=0 fi = iN fi .

226X Basic exercises > (a) A step-function on an interval [a, b] is a function F such that, for suitable
t0 , . . . , tn with a = t0 . . . tn = b, F is constant on each interval ]ti1 , ti [. Show that F : [a, b] R is a
saltus function iff for every > 0 there is a step-function G : [a, b] R such that Var[a,b] (F G) .

(b) Let F , G be real-valued functions of bounded variation defined on an interval [a, b] R. Show that,
in the language of 226C,
(F + G)p = Fp + Gp , (F + G)c = Fc + Gc ,

(F + G)cs = Fcs + Gcs , (F + G)ac = Fac + Gac .

> (c) Let F be a real-valued function of bounded variation on an interval [a, b] R. Show that, in the
language of 226C,
Var[a,b] (F ) = Var[a,b] (Fp ) + Var[a,b] (Fc ) = Var[a,b] (Fp ) + Var[a,b] (Fcs ) + Var[a,b] (Fac ).

(d) Let F be a real-valued function of bounded variation on an interval [a, b] R. Show that F is
Rb
absolutely continuous iff Var[a,b] (F ) = a |F 0 |.

(e) Consider the function g of 134I/226Cc. Show that g 1 : 0, 1] [0, 1] is differentiable almost every-
where on [0, 1], and find {x : (g 1 )0 (x) a} for each a R.

> (f ) Suppose that I and J are sets and that hai iiI is a summable family of real numbers. (i) Show
P ifP
that f : J I is injective then haf (j) ijJ P
is summable. (ii) Show that if g : I J is any function, then
jJ 1
ig [{j}] a i is defined and equal to iI ai .

226Y Further exercises (a) Explain what modifications are appropriate in the description of the
Lebesgue decomposition of a function of bounded variation if we wish to consider functions on open or
half-open intervals, including unbounded intervals.
94 The fundamental theorem of calculus 226Yb

1
(b) Suppose
R that F : [a, b] R is a function of bounded variation, and set h(y) = #(F [{y}]) for y R.
Show that h = Var[a,b] (Fc ), where Fc is the continuous part of F as defined in 226Ca/226Cd.

(c) Show that a set I is countable iff there is a summable family hai iiI of non-zero real numbers.

226 Notes and comments In 232I and 232Yb below I will revisit these ideas, linking them to a decom-
position of the Lebesgue-Stieltjes measure corresponding to a non-decreasing real function, and thence to
more general measures. All this work is peripheral to the main concerns of this volume, but I think it is
illuminating, and certainly it is part of the basic knowledge assumed of anyone working in real analysis.
231Bd Countably additive functionals 95

Chapter 23
The Radon-Nikodym Theorem
In Chapter 22, I discussed the indefinite integrals of integrable functions on R, and gave what I hope you
feel are satisfying descriptions of both the functions which are indefinite integrals (the absolutely continuous
functions) and of how to find which functions they are indefinite integrals of (you differentiate them).
For general measure spaces, we have no structure present which can give such simple formulations; but
nevertheless the same questions can be asked and, up to a point, answered.
The first section of this chapter introduces the basic machinery needed, the concept of countably additive
functional and its decomposition into positive and negative parts. The main theorem takes up the second
section: indefinite integrals are the truly continuous additive functionals; on -finite spaces, these are
the absolutely continuous countably additive functionals. In 233 I discuss the most important single
application of the theorem, its use in providing a concept of conditional expectation. This is one of the
central concepts of probability theory as you very likely know; but the form here is a dramatic generalization
of the elementary concept of the conditional probability of one event given another, and needs the whole
strength of the general theory of measure and integration as developed in Volume 1 and this chapter. I
include some notes on convex functions, up to and including versions of Jensens inequality (233I-233J).
While we are in the area of pure measure theory, I take the opportunity to discuss two further topics. The
first is an essentially elementary construction, the indefinite-integral measure defined from a non-negative
measurable function on a measure space; I think the details need a little attention, and I work through them
in 234. Rather deeper ideas are needed to deal with measurable transformations. In 235 I set out the
techniques necessary to provide an abstract basis for a general method of integration-by-substitution, with
a detailed account of sufficient conditions for a formula of the type
R R
g(y)dy = g((x))J(x)dx
to be valid.

231 Countably additive functionals


I begin with an abstract description of the objects which will, in appropriate circumstances, correspond
to the indefinite integrals of general integrable functions. In this section I give those parts of the theory
which do not involve a measure, but only a set with a distinguished -algebra of subsets. The basic concepts
are those of finitely additive and countably additive functional, and there is one substantial theorem, the
Hahn decomposition (231E).

231A Definition Let X be a set and an algebra of subsets of X (136E). A functional : R is


finitely additive, or just additive, if (E F ) = E + F whenever E, F and E F = .

231B Elementary facts Let X be a set, an algebra of subsets of X, and : R a finitely additive
functional.

(a) = 0. (For = ( ) = + .)
S Pn
(b) If E0 , . . . , En are disjoint members of then ( in Ei ) = i=0 Ei .

(c) If E, F and E F then F = E + (F \ E). More generally, for any E, F ,


F = (F E) + (F \ E).

(d) If E, F then
E F = (E \ F ) + (E F ) (E F ) (F \ E) = (E \ F ) (F \ E).
96 The Radon-Nikodym theorem 231C

231C Definition Let XPbe a set and an algebra of subsets of X.


S A function : R is countably

additive or -additiveSif n=0 En exists in R and is equal to ( nN En ) for every disjoint sequence
hEn inN in such that nN En .
Remark Note that when I use the phrase countably additive functional I mean to exclude the possibility
of as a value of the functional. Thus a measure is a countably additive functional iff it is totally finite
(211C).
You will sometimes see the phrase signed measure used to mean what I call a countably additive
functional.

231D Elementary facts Let X be a set, a -algebra of subsets of X and : R a countably


additive functional.
P
(a) is finitely additive. P
P (i) Setting En = for every n N, n=0 must be defined in R so must
be 0. (ii) Now if E, F and E F = we can set E0 = E, E1 = F , En = for n 2 and get
S P
(E F ) = ( nN En ) = n=0 En = E + F . Q Q

(b) If hEn inN is a non-decreasing sequence in , with union E , then


P
E = E0 + n=0 (En+1 \ En ) = limn En .

(c) If hEn inN is a non-increasing sequence in with intersection E , then


E = E0 limn (E0 \ En ) = limn En .

(d) If 0 : R is another countably additive functional, and c R, then + 0 : R and c : R


are countably additive.

(e) If H , then H : R is countably additive, where H E = (E H) for every E . P P If


hEn inN is a disjoint sequence in with union E then hEn HinN is disjoint, with union E H, so
S P P
H E = (H E) = ( nN (H En )) = n=0 (H En ) = n=0 H En . Q Q

Remark For the time being, we shall be using the notion of countably additive functional only on -
algebras , in which case we can take it for granted that the unions and intersections above belong to
.

231E All the ideas above amount to minor modifications of ideas already needed at the very beginning
of the theory of measure spaces. We come now to something more substantial.
Theorem Let X be a set, a -algebra of subsets of X, and : R a countably additive functional.
Then
(a) is bounded;
(b) there is a set H such that
F 0 whenever F and F H,

F 0 whenever F and F H = .

proof (a) ?? Suppose, if possible, otherwise. For E , set M (E) = sup{|F | : F , F E}; then
M (X) = . Moreover, whenever E1 , E2 , F and F E1 E2 , then
|F | = |(F E1 ) + (F \ E1 )| |(F E1 )| + |(F \ E1 )| M (E1 ) + M (E2 ),
so M (E1 E2 ) M (E1 ) + M (E2 ). Choose a sequence hEn inN in as follows. E0 = X. Given that
M (En ) = , where n N, then surely there is an Fn En such that |Fn | 1 + |En |, in which case
|(En \ Fn )| 1. Now at least one of M (Fn ), M (En \ Fn ) is infinite; if M (Fn ) = , set En+1 = Fn ;
231Xb Countably additive functionals 97

otherwise, set En+1 = En \ Fn ; in either case, note that |(En \ En+1 )| 1 and M (En+1 ) = , so that the
induction will continue.
On
Pcompleting this induction, set Gn = En \ En+1 for n N. Then hGn inN is a disjoint sequence in ,
so n=0 Gn is defined in R and limn Gn = 0; but |Gn | 1 for every n. X X
(b)(i) By (a), = sup{E : E } < . ChooseT a sequence hEn inN in such that En 2n
for every n N. For m n N, set Fmn = min Ei . Then Fmn 2 2m + 2n for every
n m. P P Induce on n. For n = m, this is due to the choice of Em = Fmm . For the inductive step, we have
Fm,n+1 = Fmn En+1 , while surely (En+1 Fmn ), so

+ Fm,n+1 (En+1 Fmn ) + Fm,n+1


= En+1 + (Fmn \ En+1 ) + Fm,n+1
= En+1 + Fmn
2n1 + 2 2m + 2n
(by the choice of En+1 and the inductive hypothesis)
= 2 2 2m + 2n1 .

Subtracting from both sides, Fm,n+1 2 2m + 2n1 and the induction proceeds. Q
Q
(ii) For m N, set
T T
Fm = nm Fmn = nm En .
Then
Fm = limn Fmn 2 2m ,
S
by 231Dc. Next, hFm imN is non-decreasing, so setting H = mN Fm we have
H = limm Fm ;
since H is surely less than or equal to , H = .
If F and F H, then
H F = (H \ F ) = H,
so F 0. If F and F H = then
H + F = (H F ) = H
so F 0. This completes the proof.

231F Corollary Let X be a set, a -algebra of subsets of X, and : R a countably additive


functional. Then can be expressed as the difference of two totally finite measures with domain .
proof Take H as described in 231Eb. Set 1 E = (E H), 2 E = (E \ H) for E . Then, as in
231Dd-e, both 1 and 2 are countably additive functionals on , and of course = 1 2 . But also, by
the choice of H, both 1 and 2 are non-negative, so are totally finite measures.
Remark This is called the Jordan decomposition of . The expression of 231Eb is a Hahn decom-
position.

231X Basic exercises (a) Let be the family of subsets A of N such that one of A, N \ A is finite.
Show that is an algebra of subsets of N. (This is the finite-cofinite algebra of subsets of N; compare
211R.)

(b) Let X be a set, an algebra of subsets of X and : R a finitely additive functional. Show that
(E F G) + (E F ) + (E G) + (F G) = E + F + G + (E F G) for all E, F , G .
Generalize this result to longer sequences of sets.
98 The Radon-Nikodym theorem 231Xc

> (c) Let be the finite-cofinite algebra of subsets of N, as in 231Xa. Define : Z by setting

E = limn #({i : i n, 2i E}) #({i : i n, 2i + 1 E})
for every E . Show that is well-defined and finitely additive and unbounded.

(d) Let X be a set and an algebra of subsets of X. (i) Show that if : R and 0 : R are
finitely additive, so are + 0 and c for any c R. (ii) Show that if : R is finitely additive and
H , then H is finitely additive, where H (E) = (H E) for every E .

(e) Let X be a set, an algebra of subsets of X and : R a finitely P additive functional. Let S be
n
the linear space of those real-valued functions onRX expressible in the form i=0 ai Ei where Ei for
each i. (i) Show that we have a linear functional : S R given by writing
R Pn Pn
i=0 ai Ei = i=0 ai Ei
R
whenever a0 , . . . , an R and E0 , . . . , En . (ii) Show that if E 0 for every E then f 0
whenever f S and f (x) 0 for every x X. (iii) Show that if is bounded and X 6= then
R
sup{| f | : f S, kf k 1} = supE,F |E F |,
writing kf k = supxX |f (x)|.

> (f ) Let X be a set, a -algebra of subsets of X and : R a finitely additive functional. Show
that the following are equiveridical:
(i) is countably additive; T
T inS and nN En = ;
(ii) limn En = 0 whenever hEn inN is a non-increasing sequence
(iii) limn En = 0 whenever hEn inN is a sequence in and nN mn Em = ;
(iv) limn En = E whenever hEn inN is a sequence in and
T S S T
E = nN mn Em = nN mn Em .
(Hint: for (i)(iv), consider non-negative first.)

(g) Let X be a set and a -algebra of subsets of X, and let : [, [ be a function P which is
countablyPn additive in the sense that = 0 and whenever hE i
n nN
S is a disjoint sequence in , n=0 En =
limn i=0 Ei is defined in [, [ and is equal to ( nN En ). Show that is bounded above and
attains its upper bound (that is, there is an H such that H = supF F ). Hence, or otherwise, show
that is expressible as the difference of a totally finite measure and a measure, both with domain .

231Y Further exercises (a) Let X be a set, an algebra of subsets of X, and : R a bounded
finitely additive functional. Set
+ E = sup{F : F , F E},

E = inf{F : F , F E},

||E = sup{F1 F2 : F1 , F2 , F1 , F2 E}.


Show that , and || are all bounded finitely additive functionals on and that = + ,
+

|| = + + . Show that if is countably additive so are + , and ||. (|| is sometimes called the
variation of .)

(b) Let X be a set and an algebra of subsets of X. Let 1 , 2 be two bounded finitely additive
functionals defined on . Set
(1 2 )(E) = sup{1 F + 2 (E \ F ) : F , F E},

(1 2 )(E) = inf{1 F + 2 (E \ F ) : F , F E}.


Show that 1 2 and 1 2 are finitely additive functionals, and that 1 + 2 = 1 2 + 1 2 . Show
that, in the language of 231Ya,
231Yi Countably additive functionals 99

+ = 0, = () 0 = ( 0), || = () = + = + + ,

1 2 = 1 + (2 1 )+ , 1 2 = 1 (1 2 )+ ,
so that 1 2 and 1 2 are countably additive if 1 and 2 are.

(c) Let X be a set and an algebra of subsets of X. Let M be the set of all bounded finitely additive
functionals from to R. Show that M is a linear space under the natural definitions of addition and scalar
multiplication. Show that M has a partial ordering defined by saying that
0 iff E 0 E for every E ,
and that for this partial ordering 1 2 , 1 2 , as defined in 231Yb, are sup{1 , 2 }, inf{1 , 2 }.

(d) Let X be a set and an algebra of subsets of X. Let 0 , . . . , n be bounded finitely additive
functionals on and set
Pn S
E = sup{ i=0 i Fi : F0 , . . . , Fn , in Fi = E, Fi Fj = for i 6= j},
Pn S
E = inf{ i=0 i Fi : F0 , . . . , Fn , in Fi = E, Fi Fj = for i 6= j}
for E . Show that and are finitely additive and are, respectively, sup{0 , . . . , n } and inf{0 , . . . , n }
in the partially ordered set of finitely additive functionals on .

(e) Let X be a set and a -algebra of subsets of X; let M be the partially ordered set of all bounded
finitely additive functionals from to R. (i) Show that if A M is non-empty and bounded above in M ,
then A has a supremum in M , given by the formula
n
X [
E = sup{ i Fi : 0 , . . . , n A, F0 , . . . , Fn , Fi = E,
i=0 in

Fi Fj = for i 6= j}.

(ii) Show that if A M is non-empty and bounded below in M then it has an infimum M , given by
the formula
n
X [
E = inf{ i Fi : 0 , . . . , n A, F0 , . . . , Fn , Fi = E,
i=0 in

Fi Fj = for i 6= j}.

(f ) Let X be a set, an algebra of subsets of X, and : R a non-negative finitely additive functional.


For E set
ca (E) = inf{supnN Fn : hFn inN is a non-decreasing sequence in with union E}.
Show that ca is a countably additive functional on and that if 0 is any countably additive functional
with 0 then 0 ca . Show that ca ( ca ) = 0.

(g) Let X be a set, an algebra of subsets of X, and : R a bounded finitely additive functional.
Show that is uniquely expressible as ca + pf a , where ca is countably additive, pf a is finitely additive
and if 0 0 |pf a | and 0 is countably additive then 0 = 0.

(h) Let X be a set and an algebra of subsets of X. Let M be the linear space of bounded finitely
additive functionals on , and for M set kk = ||(X), defining || as in 231Ya. (kk is the total
variation of .) Show that k k is a norm on M under which M is a Banach space. Show that the space of
bounded countably additive functionals on is a closed linear subspace of M .

(i) Repeat as many as possible of the results of this section for complex-valued functionals.
100 The Radon-Nikodym theorem 231 Notes

231 Notes and comments The real purpose of this section has been to describe the Hahn decomposition
of a countably additive functional (231E). The very leisurely exposition in 231A-231D is intended as a review
of the most elementary properties of measures, in the slightly more general context of signed measures,
with those properties corresponding to additivity alone separated from those which depend on countable
additivity. In 231Xf I set out necessary and sufficient conditions for a finitely additive functional on a
-algebra to be countably additive, designed to suggest that a finitely additive functional is countably
additive iff it is sequentially order-continuous in some sense. The fact that a countably additive functional
can be expressed as the difference of non-negative countably additive functionals (231F) has an important
counterpart in the theory of finitely additive functionals: a finitely additive functional can be expressed as
the difference of non-negative finitely additive functionals if (and only if) it is bounded (231Ya). But I do
not think that this, or the further properties of bounded finitely additive functionals described in 231Xe and
231Y, will be important to us before Volume 3.

232 The Radon-Nikodym theorem


I come now to the chief theorem of this chapter, one of the central results of measure theory, relating
countably additive functionals to indefinite integrals. The objective is to give a complete description of the
functionals which can arise as indefinite integrals of integrable functions (232E). These can be characterized
as the truly continuous additive functionals (232Ab). A more commonly used concept, and one adequate in
many cases, is that of absolutely continuous additive functional (232Aa); I spend the first few paragraphs
(232B-232D) on elementary facts about truly continuous and absolutely continuous functionals. I end the
section with a discussion of the decomposition of general countably additive functionals (232I).

232A Absolutely continuous functionals Let (X, , ) be a measure space and : R a finitely
additive functional.

(a) is absolutely continuous with respect to (sometimes written ) if for every > 0 there
is a > 0 such that |E| whenever E and E .

(b) is truly continuous with respect to if for every > 0 there are E , > 0 such that E is
finite and |F | whenever F and (E F ) .

(c) For reference, I add another definition here. If is countably additive, it is singular with respect to
if there is a set F such that F = 0 and E = 0 whenever E and E X \ F .

232B Proposition Let (X, , ) be a measure space and : R a finitely additive functional.
(a) If is countably additive, it is absolutely continuous with respect to iff E = 0 whenever E = 0.
(b) is truly continuous with respect to iff () it is countably additive () it is absolutely continuous
with respect to () whenever E and E 6= 0 there is an F such that F < and (E F ) 6= 0.
(c) If (X, , ) is -finite, then is truly continuous with respect to iff it is countably additive and
absolutely continuous with respect to .
(d) If (X, , ) is totally finite, then is truly continuous with respect to iff it is absolutely continuous
with respect to .
proof (a)(i) If is absolutely continuous with respect to and E = 0, then E for every > 0, so
|E| for every > 0 and E = 0.
(ii) ?? Suppose, if possible, that E = 0 whenever E = 0, but is not absolutely continuous. Then
there is an > 0 such that for every > 0 there is an E such that E butT |E|S . For each
n N we may choose an Fn such that Fn 2n and |Fn | . Consider F = nN kn Fk . Then
we have
S P
F inf nN ( kn Fk ) inf nN k=n 2k = 0,
so F = 0.
232B The Radon-Nikodym theorem 101

Now recall that by 231Eb there is an H such that G 0 when G and G H, and G 0
when G and G H = . As in 231F, set 1 G = (G H), 2 G = (G \ H) for G , so that 1 and
2 are totally finite measures, and 1 F = 2 F = 0 because (F H) = (F \ H) = 0. Consequently
S
0 = i F = limn i ( mn Fm ) lim supn i Fn
for both i, and
0 = limn (1 Fn + 2 Fn ) lim inf n |Fn | > 0,
which is absurd. X
X
(b)(i) Suppose that is truly continuous with respect to . It is obvious from the definitions that
is absolutely continuous with respect to . If E 6= 0, there must be an F of finite measure such that
|G| < |E| whenever G F = , so that |(E \ F )| < |E| and (E F ) 6= 0. This deals with the
conditions () and ().
To check that is countably additive, let hEn inN be a disjoint sequence in , with union E, and > 0.
Let > 0, F be such F < and |G| whenever G and (F G) . Then
P
n=0 (En F ) F < ,
P
S
so there is an n N such that i=n (Ei F ) . Take any m n and consider Em = im Ei . We have
Pm
|E i=0 Ei | = |E Em | = |(E \ Em )| ,
because

P
(F E \ Em )= i=m+1 (F Ei ) .
As is arbitrary,
P
E = i=0 Ei ;
as hEn inN is arbitrary, is countably additive.
(ii) Now suppose that satisfies the three conditions. By 231F, can be expressed as the difference
of two non-negative countably additive functionals 1 , 2 ; set 0 = 1 + 2 , so that 0 is a non-negative
countably additive functional and |F | 0 F for every F . Set
= sup{ 0 F : F , F < } 0 X < ,
S
and choose a sequence hFn inN of sets of finite measure such that limn 0 Fn = ; set F = nN Fn . If

G and G F = then G = 0. P P?? Otherwise, by condition (), there is an F such that F <
and (G F ) 6= 0. It follows that
0 (F \ F ) 0 (F G) |(F G)| > 0,
and there must be an n N such that
< 0 Fn + 0 (F \ F ) = 0 (Fn (F \ F )) 0 (F Fn )
because (F FS n ) < ; but this is impossible. X
XQQ
Setting Fn = kn Fk for each n, we have limn 0 (F \ Fn ) = 0. Take any > 0, and (using condition
()) let > 0 be such that |E| 12 whenever E . Let n be such that 0 (F \ Fn ) 12 . Now if F
and (F Fn ) then

|F | |(F Fn )| + |(F F \ Fn )| + |(F \ F )|


1
+ 0 (F F \ Fn ) + 0
2
1 1 1
+ 0 (F \ Fn ) + = .
2 2 2

And Fn < . As is arbitrary, is truly continuous.


(c) Now suppose that (X, , ) is -finite and that is countably additive and absolutely continuous
with respect to . Let hXn inN be a non-decreasing sequence of sets of finite measure covering X (211D).
102 The Radon-Nikodym theorem 232B

If E 6= 0, then limn (E Xn ) 6= 0, so (E Xn ) 6= 0 for some n. This shows that satisfies condition


() of (b), so is truly continuous.
Of course the converse of this fact is already covered by (b).
(d) Finally, suppose that X < and that is absolutely continuous with respect to . Then it must
be truly continuous, because we can take F = X in the definition 232Ab.

232C Lemma Let (X, , ) be a measure space and , 0 two countably additive functionals on
which are truly continuous with respect to . Take c R and H , and set H E = (E H), as in 231De.
Then + 0 , c and H are all truly continuous with respect to , and is expressible as the difference of
non-negative countably additive functionals which are truly continuous with respect to .
proof Let > 0. Set = /(2 + |c|) > 0. Then there are , 0 > 0 and E, E 0 such that E < ,
E 0 < and |F | whenever (F E) , | 0 F | whenever (F E) 0 . Set = min(, 0 ) > 0,
E = E E 0 ; then
E E + E 0 < .
Suppose that F and (F E ) ; then
(F H E) (F E) , (F E 0 )
so
|( + 0 )F | |F | + | 0 F | + ,

|(c)F | = |c||F | |c| ,

|H F | = |(F H)| .
0
As is arbitrary, + , c and H are all truly continuous.
Now, taking H from 231Eb, we see that 1 = H and 2 = X\H are truly continuous and non-negative,
and = 1 2 is the difference of truly continuous measures.

232D PropositionR Let (X, , ) be a measure space, and f a -integrable real-valued function. For
E set E = E f . Then : R is a countably additive functional and is truly continuous with
respect to , therefore absolutely continuous with respect to .
R R
proof Recall that E f = f E is defined for every E (131F). So : R is well-defined. If E,
F are disjoint then
Z Z
(E F ) = f (E F ) = (f E) + (f F )
Z Z
= f E + f F = E + F,

so is finitely additive.
Now 225A, without using the phrase truly continuous, proved exactly that is truly continuous with
respect to . It follows from 232Bb that is countably additive and absolutely continuous.
R
Remark The functional E 7 E f is called the indefinite integral of f .

232E We are now at last ready for the theorem.


The Radon-Nikodym theorem Let (X, , ) be a measure space and : R a function. Then the
following are equiveridical: R
(i) there is a -integrable function f such that E = E f for every E ;
(ii) is finitely additive and truly continuous with respect to .
R
proof (a) If f is a -integrable real-valued function and E = E f for every E , then 232D tells us
that is finitely additive and truly continuous.
232F The Radon-Nikodym theorem 103

(b) In the other direction, suppose that is finitely additive and truly continuous; note that (by 232B(a-
b)) E = 0 whenever E = 0. To begin with, suppose that is non-negative
R and
R not zero.
In this case, there is a non-negative simple function f such that f > 0 and E f E for every E .
P Let H be such that H > 0; set = 13 H > 0. Let E , > 0 be such that E < and F
P
whenever F and (F E) ; then (H \ E) so E (H E) 2 and E (H E) > 0.
Set E F = (F E) for every F ; then E is a countably additive functional on . Set 0 = E ,
where = /E; then 0 is a countably additive functional and 0 E > 0. By 251Eb, as usual, there is a set
G such that 0 F 0 if F , F G, but 0 F 0 if F and F G = . As 0 (E \ G) 0,
0 < 0 E 0 (E G) (E G)
R
and (E G) > 0. Set f = (E G); then f is a non-negative simple function and f = (E G) > 0.
If F then 0 (F G) 0, that is,
R
(F G) E (F G) = (F E G) = F
f.
So
R
F (F G) F
f,
as required. Q
Q
(c) Still supposing that is a non-negative, truly
R continuous additive functional, let be the set of
non-negative simple functions f : X R such that E f E for every E ; then the constant function
0 belongs to , so is not empty.
If f , g then f g , where (f g)(x) = max(f (x), g(x)) for x X. P
P Set H = {x : (f g)(x)
0} ; then f g = (f H) + (g (X \ H)) is a non-negative simple function, and for any E ,
R R R
E
f g = EH
f+ E\H
g (E H) + (E \ H) = E. Q
Q
Set
R
= sup{ f : f } X < .
R
Choose a sequence
R hfRn inN in such that limn fRn = . For each n, set gn = f0 f1 . . . fn ; then
gn and fRn gn for each n, so limn gn = . By B.Levis theorem, f = limn gn is
integrable and f = . Note that if E then
R R
E
f = limn fn E. E
R
?? Suppose, if possible, that there is an H such that H f 6= H. Set
R
1 F = F F
f 0
for every F ; then by (a) of this proof and 232C, 1 is a truly continuous finitely additive functional,
and we
R are supposing that 1 6= 0. ByR (b) of this proof, there is a non-negative
R R simple function g such
that F g 1 F for every F and g > 0. Take n N such that fn + g > . Then fn + g is a
non-negative simple function and
R R R R R R
(f + g) =
F n F
fn + F
g F
f+ F
g = F 1 F + F
g F
for any F , so fn + g , and
R R R
< fn + g= fn + g ,
R
which is absurd. X
X Thus we have H
f = H for every H .
(d) This proves the theorem for non-negative . For general , we need only observe that is expressible
as 1 2 , where 1 and 2 are non-negative truly continuous
R countably additive functionals, by 232C; so
that there are integrable functions Rf1 , f2 such that i F = F fi for both i and every F . Of course
f = f1 f2 is integrable and F = F f for every F . This completes the proof.

232F Corollary Let (X, , ) be a -finite


R measure space and : R a function. Then there is
a -integrable function f such that E = E f for every E iff is countably additive and absolutely
continuous with respect to .
104 The Radon-Nikodym theorem 232F

proof Put 232Bc and 232E together.

232G Corollary Let (X, , ) be a totally finite


R measure space and : R a function. Then there is
a -integrable function f on X such that E = E f for every E iff is finitely additive and absolutely
continuous with respect to .
proof Put 232Bd and 232E together.

232H Remarks (a) Most authors are satisfied with 232F as the Radon-Nikodym theorem. In my view
the problem of identifying indefinite integrals is of sufficient importance to justify an analysis which applies
to all measure spaces, even if it requires a new concept (the notion of truly continuous functional).

(b) I ought to offer an example of an absolutely continuous functional which is not truly continuous. A
simple one is the following. Let X be any uncountable set. Let be the countable-cocountable -algebra
of subsets of X and the countable-cocountable measure on X (211R). Let be the restriction to of
counting measure on X. If E = 0 then E = and E = 0, so is absolutely continuous. But for any E
of finite measure we have (X \ E) = 1, so is not truly continuous. See also 232Xf(i).

*(c) The space (X, , ) of this example is, in terms of the classification developed in Chapter 21,
somewhat irregular; for instance, it is neither locally determined nor localizable, and therefore not strictly
localizable, though it is complete and semi-finite. Can this phenomenon occur in a strictly localizable
measure space? We are led here into a fascinating question. Suppose, in (b), I used the same idea, but with
= PX. No difficulty arises in constructing ; but can there now be a with the required properties,
that is, a non-zero countably additive functional from PX to R which is zero on all finite sets? This is the
Banach-Ulam problem, on which I have written extensively elsewhere (Fremlin 93), and to which I hope
to return in Volume 5. The present question is touched on again in 363S in Volume 3.

(d) Following the Radon-Nikodym theorem, the question immediately arises: for a given , how much
possible variation is there in the corresponding f ? The answer is straightforward enough: two integrable
functions f and g give rise to the same indefinite integral iff they are equal almost everywhere (131Hb).

(e) I have stated the Radon-Nikodym theorem in terms of arbitrary integrable functions, meaning to
interpret integrability in a wide sense, according to the conventions of Volume 1. However, given a truly
continuous countably additive functional , we can ask whether there is in any sense a canonical integrable
function representing it. The answer is no. But we certainly do not need to take arbitrary integrable
functions of the type considered in Chapter 12. If f is any integrable function, there is a conegligible set E
such that f E is measurable, and now we can find a conegligible measurable set G E dom f ; if we set
g(x) = f (x) for x G, 0 for x X \ G, then f =a.e. g, so g has the same indefinite integral as f (as noted
in (d) just above), while g is measurable and has domain X. Thus we can make a trivial, but sometimes
convenient, refinement to the theorem: if (X, , ) is a measure space, and : R is finitely additive
and truly
R continuous with respect to , then there is a -measurable -integrable function g : X R such
that E g = E for every E .

(f )The phrase Radon-Nikodym


R derivative of with respect to is sometimes used to describe an
integrable function f such that E f d = E for every measurable set E, as in 232D-E. When and f are
non-negative, f may be called a density function.

(g) Throughout the work above I have taken it that is defined on the whole domain of . In some of
the most important applications, however, is defined only on some smaller -algebra T. In this case we
commonly seek to apply the same results with T in place of .

232I The Lebesgue decomposition of a countably additive functional: Proposition (a) Let
(X, , ) be a measure space and : R a countably additive functional. Then has unique expressions
as
= s + ac = s + tc + e ,
232I The Radon-Nikodym theorem 105

where tc is truly continuous with respect to , s is singular with respect to , and e is absolutely
continuous with respect to and zero on every set of finite measure.
(b) If X = R r , is the algebra of Borel sets in R r and is the restriction of Lebesgue measure to ,
then is uniquely expressible as p + cs + ac where ac is absolutely
P continuous with respect to , cs is
singular with respect to and zero on singletons, and p E = xE p {x} for every E .
proof (a)(i) Suppose first that is non-negative. In this case, set
s E = sup{(E F ) : F , F = 0},

t E = sup{(E F ) : F , F < }.
Then both s and t are countably additive. P
P Surely s = t = 0. Let hEn inN be a disjoint sequence
) If F and F = 0, then
in with union E. (
P P
(E F ) = n=0 (En F ) n=0 s (En );
as F is arbitrary,
P
s E n=0 s En .
) If F and F < , then
(
P P
(E F ) = n=0 (En F ) n=0 t (En );
as F is arbitrary,
P
t E n=0 t En .
P P
( ) If > 0, then (because n=0 En = E < ) there is an n N such that k=n+1 Ek . Now,

for each k n, there is an Fk such that Fk = 0 and (Ek Fk ) s Ek n+1 . In this case,
S
F = kn Fk , F = 0 and
Pn Pn P
s E (E F ) k=0 (Ek Fk ) k=0 s Ek k=0 s Ek 2,
because
P P
k=n+1 s Ek k=n+1 Ek .
As is arbitrary,
P
s E k=0 s Ek .

Fk0
( ) Similarly, for each k n, there is an such that Fk0 < and (Ek Fk0 ) t Ek n+1 . In this
S
case, F 0 = kn Fk0 , F 0 < and
Pn Pn P
t E (E F 0 ) k=0 (Ek Fk0 ) k=0 t Ek k=0 t Ek 2,
because
P P
k=n+1 t Ek k=n+1 Ek .
As is arbitrary,
P
t E t Ek .
k=0
P P
() Putting these together, s E = n=0 s En and t E = n=0 t En . As hEn inN is arbitrary, s and t
are countably additive. QQ
(ii) Still supposing that is non-negative,
S if we choose a sequence hFn inN in such that Fn = 0 for
each n and limn Fn = s X, then F = nN Fn has F = 0, F = s X; so that s (X \ F ) = 0, and
s is singular with respect to in the sense of 232Ac.
Note that s F = F whenever F = 0. So if we write ac = s , then ac is a countably additive
functional and ac F = 0 whenever F = 0; that is, ac is absolutely continuous with respect to .
If we write tc = t s , then tc is a non-negative countably additive functional; tc F = 0 whenever
F = 0, and if tc E > 0 there is a set F with F < and tc (E F ) > 0. So tc is truly continuous with
respect to , by 232Bb. Set e = t = ac tc .
Thus for any non-negative countably additive functional , we have expressions
106 The Radon-Nikodym theorem 232I

= s + ac , ac = tc + e
where s , ac , tc and e are all non-negative countably additive functionals, s is singular with respect to
, ac and e are absolutely continuous with respect to , tc is truly continuous with respect to , and
e F = 0 whenever F < .
(iii) For general countably additive functionals : R, we can express as 0 00 , where 0 and 00
are non-negative countably additive functionals. If we define s0 , s00 , . . . , e00 as in (i)-(ii), we get countably
additive functionals
s = s0 s00 , 0
ac = ac 00
ac , 0
tc = tc 00
tc , e = e0 e00
such that s is singular with respect to (if F 0 , F 00 are such that
F = F 0 = s0 (X \ F ) = s00 (X \ F ) = 0,
then (F 0 F 00 ) = 0 and s E = 0 whenever E X \ (F 0 F 00 )), ac is absolutely continuous with respect
to , tc is truly continuous with respect to , and e F = 0 whenever F < , while
= s + ac = s + tc + e .

(iv) Moreover, these decompositions are unique. P P() If, for instance, = s + ac , where s is
singular and ac is absolutely continuous with respect to , let F , F be such that F = F = 0 and
s E = 0 whenever E F = , s E = 0 whenever E F = ; then we must have
ac (E (F F )) = ac (E (F F )) = 0
for every E , so
s E = (E (F F )) = s E
for every E . Thus s = s and ac = ac .
) Similarly, if ac = tc + e where tc is truly continuous with respect to and e F = 0 whenever
(
F < , then there are sequences hFn inN , hFn inN of sets of finite measure such that tc F = 0 whenever
S S S
F nN Fn = and tc F = 0 whenever F nN Fn = . Write F = nN (Fn Fn ); then e E = e E = 0
whenever E F and tc E = tc E = 0 whenever E F = , so e E = ac (E \ F ) = e E for every E ,
and e = e , tc = tc . Q
Q
(b) In this case, is -finite (cf. 211P), so every absolutely continuous countably additive functional
is truly continuous (232Bc), and we shall always have e = 0, ac = tc . But in the other direction we
know that singleton sets, and therefore countable sets, are all measurable. We therefore have a further
decomposition s = p + cs , where there is a countable set K R r with p E = 0 whenever E ,
E K = , and cs is singular with respect to and zero on countable sets. PP (i) If 0, set
p E = sup{(E K) : K R r is countable};
just as with s , dealt with in (a) above, p is countably additive and there is a countable K R r such
that p E = (E K) for every E . (ii) For general , we can express as 0 00 where 0 and 00 are
non-negative, and write p = p0 p00 . (iii) p is characterized by saying that there is a countable set K such
that p E = (E K) for every E and {x} = 0 for every x R r \ K. (iv) So if we set cs = s p ,
cs will be singular with respect to and zero on countable sets. Q Q
Now, for any E ,
P P
p E = (E K) = xKE {x} = xE {x}.

Remark The expression = p + cs + ac of (b) is the Lebesgue decomposition of .

232X Basic exercises (a) Let (X, , ) be a measure space and : R a countably additive
functional which is absolutely continuous with respect to . Show that the following are equiveridical: (i)
is truly continuous with respect to ; (ii) there
S is a sequence hEn inN in such that En < for every
n N and F = 0 whenever F and F nN En = .
232Yf The Radon-Nikodym theorem 107

> (b) Let g : R R be a bounded non-decreasing function and g the associated Lebesgue-Stieltjes
measure (114Xa). Show that g is absolutely continuous (equivalently, truly continuous) with respect to
Lebesgue measure iff the restriction of g to any closed bounded interval is absolutely continuous in the sense
of 225B.

(c) Let X be a set and a -algebra of subsets of X; let : R be a countably additive functional.
Let I be an ideal of , that is, a subset of such that () I () E F I for all E, F I () if
E , F I and E F then E I. Show that has a unique decomposition as = I + I0 , where I
and I0 are countably additive functionals, I0 E = 0 for every E I, and whenever E , I E 6= 0 there
is an F I such that I (E F ) 6= 0.

> (d) Let X be a non-empty set and a -algebra of subsets of X. Show that for any sequence hn inN
of countably additive functionals on there is a probability measure on X, with domain , such that
every n is absolutely continuous with respect to . (Hint: start with the case n 0.)

(e) Let (X, , ) be a measure space and (X, , ) its completion (212C). Let : R be an additive
functional such that E = 0 whenever E = 0. Show that has a unique extension to an additive functional
: R such that E = 0 whenever E = 0.

(f ) Let F be an ultrafilter on N including the filter {N\I : I N is finite} (2A1O). Define : PN {0, 1}
by setting E = 1 if E F, 0 for E PN\F. (i) Let 1 be counting measure on PN. Show that is additive
and absolutelyP continuous with respect to 2 , but is not truly continuous. (ii) Define 2 : PN [0, 1] by
setting 2 E = nE 2n1 . Show that is zero on 2 -negligible sets, but is not absolutely continuous with
respect to 2 .

(g) Rewrite this section in terms of complex-valued additive functionals.

(h) Let (X, , ) be a measure space, and and additive functionals on of which is positive and
countably additive, so that (X, , ) is also a measure space. (i) Show that if is absolutely continuous
with respect to and is absolutely continuous with respect to , then is absolutely continuous with
respect to . (ii) Show that if is truly continuous with respect to and is absolutely continuous with
respect to then is truly continuous with respect to .

232Y Further exercises (a) Let (X, , ) be a measure space and : R a finitely additive
functional. If E, F , H and H < set H (E, F ) = (H (E4F )). (i) Show that H is a pseudometric
on (2A3Fa). (ii) Let T be the topology on generated by {H : H , H < } (2A3Fc). Show that
is continuous for T iff it is truly continuous in the sense of 232Ab. (T is the topology of convergence in
measure on .)

(b) For a non-decreasing function F : [a, b] R, where a < b, let F be the corresponding Lebesgue-
Stieltjes measure. Show that if we define (F )ac , etc., with regard to Lebesgue measure on [a, b], as in 232I,
then
(F )p = Fp , (F )ac = Fac , (F )cs = Fcs ,
where Fp , Fcs and Fac are defined as in 226C.

(c) Extend the idea of (b) to general functions F of bounded variation.

(d) Extend the ideas of (b) and (c) to open, half-open and unbounded intervals (cf. 226Ya).

(e) Let (X, , ) be a measure space and (X, , ) its c.l.d. version (213E). Let : R be an additive
functional which is truly continuous with respect to . Show that has a unique extension to a functional
: R which is truly continuous with respect to .

(f ) Let (X, , ) be a measure space and f a -integrable real-valued function. Show that the indefinite
integral of f is the unique countably additive functional : R such that whenever E and
f (x) [a, b] for almost every x E, then aE E bE.
108 The Radon-Nikodym theorem 232Yg

(g) Say that two bounded additive functionals 1 , 2 on an algebra of sets are mutually singular if
for any > 0 there is an H such that
sup{|1 F | : F , F H} ,

sup{|2 F | : F , F H = } .
(i) Show that 1 and 2 are mutually singular iff, in the language of 231Ya-231Yb, |1 | |2 | = 0.
(ii) Show that if is a -algebra and 1 and 2 are countably additive, then they are mutually singular
iff there is an H such that 1 F = 0 whenever F and F H, while 2 F = 0 whenever F and
F H = .
(iii) Show that if s , tc and e are defined from and as in 232I, then each pair of the three are
mutually singular.

(h) Let (X, , ) be a measure space and f a non-negative real-valued function


R which
R is integrable over
X; let be its indefinite integral. Show that for any function g : X R, g d = f g d in the sense
that if one of these is defined in [, ] so is the other, and they are then equal. (Hint: start with simple
functions g.)

(i) Let (X, , ) be a measure space, f an integrable function, and : R the indefinite integral of f .
Show that ||, as defined in 231Ya, is the indefinite integral of |f |.

(j) Let X be a set, a -algebra of subsets of X, and : R a countably additive functional. Show
that has a Radon-Nikodym derivative with respect to || as defined in 231Ya, and that any such derivative
has modulus equal to 1 ||-a.e.

232 Notes and comments The Radon-Nikodym theorem must be on any list of the half-dozen most
important theorems of measure theory, and not only the theorem itself, but the techniques necessary to
prove it, are at the heart of the subject. In my book Fremlin 74 I discussed a variety of more or less
abstract versions of the theorem and of the method, to some of which I will return in 327 and 365 of the
next volume.
As I have presented it here, the essence of the proof is split between 231E and 232E. I think we can
distinguish the following elements. Let be a countably additive functional.
(i) is bounded (231Ea).
(ii) is expressible as the difference of non-negative functionals (231F).
(I gave this as a corollary of 231Eb, but it can also be proved by simpler methods, as in 231Ya.)
(iii) If > 0, there is an integrable f such that 0 < f ,
writing f for the indefinite integral of f . (This is the point at which we really do need the Hahn decompo-
sition 231Eb.)
(iv) The Rset = {f : f } is closed under countable suprema, so there is an f
maximising f .
(In part (b) of the proof of 232E, I spoke of simple functions; but this was solely to simplify the technical
details, and the same argument works if we apply it to instead of . Note the use here of B.Levis
theorem.)
(v) Take f from (iv) and use (iii) to show that f = 0.
Each of the steps (i)-(iv) requires a non-trivial idea, and the importance of the theorem lies not only in its
remarkable direct consequences in the rest of this chapter and elsewhere, but in the versatility and power
of these ideas.
I introduce the idea of truly continuous functional in order to give a reasonably straightforward account
of the status of the Radon-Nikodym theorem in non--finite measure spaces. Of course the whole point is
that a truly continuous functional, like an indefinite integral, must be concentrated on a -finite part of
the space (232Xa), so that 232E, as stated, can be deduced easily from the standard form 232F. I dare to
use the word truly in this context because this kind of continuity does indeed correspond to a topological
notion (232Ya).
233B Conditional expectations 109

There is a possible trap in the definition I give of absolutely continuous functional. Many authors use
the condition of 232Ba as a definition, saying that is absolutely continuous with respect to if E = 0
whenever E = 0. For countably additive functionals this coincides with the - formulation in 232Aa; but
for other additive functionals this need not be so (232Xf(ii)). Mostly the distinction is insignificant, but I
note that in 232Bd it is critical, since there is not assumed to be countably additive.
In 232I I describe one of the many ways of decomposing a countably additive functional into mutually
singular parts with special properties. In 231Yf-231Yg I have already suggested a method of decomposing
an additive functional into the sum of a countably additive part and a purely finitely additive part. All
these results have natural expressions in terms of the ordered linear space of bounded additive functionals
on an algebra (231Yc).

233 Conditional expectations


I devote a section to a first look at one of the principal applications of the Radon-Nikodym theorem. It
is one of the most vital ideas of measure theory, and will appear repeatedly in one form or another. Here I
give the definition and most basic properties of conditional expectations as they arise in abstract probability
theory, with notes on convex functions and a version of Jensens inequality (233I-233J).

233A -subalgebras Let X be a set and a -algebra of subsets of X. A -subalgebra of is a


-algebra T of subsets of X such that T . If (X, , ) is a measure space and T is a -subalgebra of ,
then (X, T, T) is again a measure space; this is immediate from the definition (112A). Now we have the
following straightforward lemma. It is a special case of 235I below, but I give a separate proof in case you
do not wish as yet to embark on the general investigation pursued in 235.

233B Lemma Let (X, , ) be a measure space and T a -subalgebra of . A real-valued function f
defined on a subset of X is T-integrableRiff (i) it is -integrable
R (ii) dom f is T-conegligible (iii) f is
T-virtually measurable; and in this case f d( T) = f d.
Pn
proof (a) Note first that if f is a T-simple function, that is, is expressible as i=0 ai Ei where ai R,
Ei T and ( T)Ei < for each i, then f is -simple and
R Pn R
f d = i=0
ai Ei = f d( T).

(b) Let U be the set of non-negative -integrable functions and U T the set of non-negative T-
integrable functions.
Suppose f U T . Then there is a non-decreasing sequence hfn inN of T-simple functions such that
f (x) = limn fn T-a.e. and
R R
f d( T) = limn fn d( T).
R R
But now every
R fn is Ralso -simple, and fn d = fn d( T) for every n, and f = limn fn -a.e. So
f U and f d = f d( T).
(c) Now suppose
R thatR f is T-integrable. Then it is the difference of two members of U T , so is -
integrable, and f d = f d( T). Also conditions (ii) and (iii) are satisfied, according to the conventions
established in Volume 1 (122Nc, 122P-122Q).
(d) Suppose that f satisfies conditions (i)-(iii). Then |f | U , and there is a conegligible set E dom f
such that E T and f E is T-measurable. Accordingly |f |E is T-measurable. Now, if > 0, then
R
( T){x : x E, |f |(x) } = {x : x E, |f |(x) } 1 |f |d < ;
moreover,
110 The Radon-Nikodym theorem 233B

Z
sup{ g d( T) : g is a T-simple function, g |f | T-a.e.}
Z
= sup{ g d : g is a T-simple function, g |f | T-a.e.}
Z
sup{ g d : g is a -simple function, g |f | -a.e.}
Z
|f |d < .

By the criterion of 122Ja, |f | U T . Consequently f , being T-virtually T-measurable, is T-integrable,


by 122P. This completes the proof.

233C Remarks (a) My argument just above is detailed to the point of pedantry. I think, however,
that while I can be accused of wasting paper by writing everything down, every element of the argument is
necessary to the result. To be sure, some of the details are needed only because I use such a wide notion of
integrable function; if you restrict the notion of integrability to measurable functions defined on the whole
measure space, there are simplifications at this stage, to be paid for later when you discover that many of
the principal applications are to functions defined by formulae which do not apply on the whole underlying
space.
The essential point which does have to be grasped is that while a T-negligible set is always -negligible,
a -negligible set need not be T-negligible.

(b) As the simplest possible example of the problems which can arise, I offer the following. Let (X, , )
be [0, 1]2 with Lebesgue measure. Let T be the set of those members of expressible as F [0, 1] for some
F [0, 1]; it is easy to see that T is a -subalgebra of . Consider f , g : X [0, 1] defined by saying that
f (t, u) = 1 if u > 0, 0 otherwise,

g(t, u) = 1 if t > 0, 0 otherwise.


Then both f and g are -integrable, being constant -a.e. But only g is T-integrable, because any non-
negligible E T includes a complete vertical section {t} [0, 1], so that f takes both values 0 and 1 on E.
If we set
h(t, u) = 1 if u > 0, undefined otherwise,
then again (on the conventions I use) h is -integrable but not T-integrable, as there is no conegligible
member of T included in the domain of h.

(c) If f is defined everywhere on X, and T is complete, then of course f is T-integrable iff it is


-integrable and T-measurable. But note that in the example just above, which is one of the archetypes for
this topic, T is not complete, as singleton sets are negligible but not measurable.

233D Conditional expectations Let (X, , ) be a probability space, that is, a measure space with
X = 1. (Nearly all the ideas here work perfectly well for any totally finite measure space, but there seems
nothing to be gained from the extension, and the traditional phrase conditional expectation demands a
probability space.) Let T be a -subalgebra.

(a) For any -integrable real-valued function f defined on a conegligible


R subset of X, we have a corre-
sponding indefinite integral f : R given by the formula f E = E f for every E . We know that
f is countably additive and truly continuous with respect to , which in the present context is the same
as saying that it is absolutely continuous (232Bc-232Bd). Now consider the restrictions T, f T of
and f to the -algebra T. It follows directly from the definitions of countably additive and absolutely
continuous that f T is countably additive and absolutely continuous with respect to T, therefore truly
continuous with respect to T. Consequently, the
R Radon-Nikodym theorem (232E) tells us that there is a
T-integrable function g such that (f T)F = F g d( T) for every F T.
233E Conditional expectations 111

(b) Let us define aRconditional expectation


R of f on T to be such a function; that is, a T-integrable
function g such that F g d( T) = F f d for every F T. Looking back at 233B, we see that for such a
g we have
R R R R
F
g d( T) = g F d( T) = g F d = F
g d
for every F T; also, that g is almost everywhere equal to a T-measurable function defined everywhere on
X which is also a conditional expectation of f on T (232He).

(c) I set the word a of the phrase a conditional expectation in bold type to emphasize that there is
nothing unique about the function g. In 242J I will return to this point, and describe an object which could
properly be called the conditional expectation of f on T. g is essentially unique only in the sense that if
g1 , g2 are both conditional expectations of f on T then g1 = g2 T-a.e. (131Hb). This does of course mean
that a very large number of its properties for instance, the distribution function G(a) = {x : g(x) a},
where is the completion of (212C) are independent of which g we take.

(d) A word of explanation of the phrase conditional expectation is in order. This derives from the
standard identification of probability with measure, due to Kolmogorov, which I will discuss more fully
in Chapter 27. A real-valued random variable may be regarded as a measurable, orRvirtually measurable,
function f on a probability space (X, , ); its expectation becomes identified with fRd, supposing that
1
this exists. If F and F > 0 then the conditional expectation of f given F is F F
f . If F0 , . . . , Fn
is a partition of X into measurable sets of non-zero measure, then the function g given by
1 R
g(x) = Fi
f if x Fi
Fi

is a kind of anticipated conditional expectation; if we are one day told that x Fi , then g(x) will be
our subsequent estimate of the expectation of f . In the terms of the definition above, g is a conditional
expectation of f on the finite algebra T generated by {F0 , . . . , Fn }. An appropriate intuition for general
-algebras T is that they consist of the events which we shall be able to observe at some stated future time
t0 , while the whole algebra consists of all events, including those not observable until times later than t0 ,
if ever.

233E I list some of the elementary facts concerning conditional expectations.


Proposition Let (X, , ) be a probability space and T a -subalgebra of . Let hfn inN be a sequence of
-integrable real-valued functions, and for each n let gn be a conditional expectation of fn on T. Then
(a) g1 + g2 is a conditional expectation of f1 + f2 on T;
(b) for any c R, cg0 is a conditional expectation of cf0 on T;
(c) if f1 a.e. f2 then g1 a.e. g2 ;
(d) if hfn inN is non-decreasing a.e. and f = limn fn is -integrable, then limn gn is a conditional
expectation of f on T;
(e) if f = limn fn is defined a.e. and there is a -integrable function h such that |fn | a.e. h for every
n, then limn gn is a conditional expectation of f on T;
(f) if F T then g0 F is a conditional expectation of f0 F on T;
(g) if h is a bounded, T-virtually measurable real-valued function defined T-almost everywhere on
X, then g0 h is a conditional expectation of f0 h on T;
(h) if is a -subalgebra of T, then a function h0 is a conditional expectation of f0 on iff it is a
conditional expectation of g0 on .
proof (a)-(b) We have only to observe that
R R R R R R
F
g1 + g2 d( T) = F
g1 d( T) + F
g2 d( T) = F
f1 d + F
f2 d = F
f1 + f2 d,
R R R R
F
cg0 d( T) = c F
g0 d( T) = c F
f0 d = F
cf0 d
for every F T.
(c) If F T then
112 The Radon-Nikodym theorem 233E
R R R R
F
g1 d( T) = F
f1 d F
f2 d = F
g2 d( T)
for every F T; consequently g1 g2 T-a.e. (131Ha).
(d) By (c), hgn inN is non-decreasing T-a.e.; moreover,
R R R
supnN gn d( T) = supnN fn d = f d < .
By B.Levis theorem, g = limn gn is defined T-almost everywhere, and
R R R R
F
g d( T) = limn F
gn d( T) = limn F
fn d = F
f d
for every F T, so g is a conditional expectation of f on T.
(e) Set fn0 = inf mn fm , fn00 = supmn fm for each n N. Then we have
h a.e. fn0 fn fn00 a.e. h,
and hfn0 inN , hfn00 inN are almost-everywhere-monotonic sequences of functions both converging almost ev-
erywhere to f . For each n, let gn0 , gn00 be conditional expectations of fn0 , fn00 on T. By (iii) and (iv), hgn0 inN
and hgn00 inN are almost-everywhere-monotonic sequences converging almost everywhere to conditional ex-
pectations g 0 , g 00 of f . Of course g 0 = g 00 T-a.e. (233Dc). Also, for each n, gn0 a.e. gn a.e. gn00 , so hgn inN
converges to g 0 T-a.e., and g = limn gn is defined almost everywhere and is a conditional expectation
of f on T.
(f ) For any H T,
R R R R
g F d( T) =
H 0 HF
g0 d( T) = HF
f0 d = H
f0 F d.
Pn
(g)(i) If h is actually ( T)-simple, say h = i=0 ai Fi where Fi T for each i, then
R Pn R Pn R R
F
g0 h d( T) = i=0
ai F
g0 Fi d( T) = i=0
ai F
f Fi d = F
f h d
for every F T. (ii) For the general case, if h is T-virtually measurable and |h(x)| M T-almost
everywhere, then there is a sequence hhn inN of T-simple functions converging to h almost everywhere,
and with |hn (x)| M for every x, n. Now f0 hn f0 h a.e. and |f0 hn | a.e. M |f0 | for each n, while
g0 hn is a conditional expectation of f0 hn for every n, so by (e) we see that limn g0 hn will be a
conditional expectation of f0 h; but this is equal almost everywhere to g0 h.
R R
(h) We need note only that H g0 d( T) = H f0 d for every H , so
Z Z
h0 d( ) = g0 d( T) for every H
H H
Z Z
h0 d( ) = f0 d for every H .
H H

233F Remarks Of course the results above are individually nearly trivial (though I think (e) and
(g) might give you pause for thought if they were offered without previous preparation of the ground).
Cumulatively they amount to some quite strong properties. In 242 I will restate them in language which
is syntactically more direct, but relies on a deeper level of abstraction.
As an illustration of the power of conditional expectations to surprise us, I offer the next proposition,
which depends on the concept of convex function.

233G Convex functions Recall that a real-valued function defined on an interval I R is convex
if
(tb + (1 t)c) t(b) + (1 t)(c)
whenever b, c I and t [0, 1].
Examples The formulae |x|, x2 , ex x define convex functions on R; on ]1, 1[ we have 1/(1 x2 ); on
]0, [ we have 1/x and x ln x; on [0, 1] we have the function which is zero on ]0, 1[ and 1 on {0, 1}.
233I Conditional expectations 113

233H The general theory of convex functions is both extensive and important; I list a few of their more
salient properties in 233Xe. For the moment the following lemma covers what we need.
Lemma Let I R be a non-empty open interval (bounded or unbounded) and : I R a convex function.
(a) For every a I there is a b R such that (x) (a) + b(x a) for every x I.
(b) If we take, for each q I Q, a bq R such that (x) (q) + bq (x q) for every x I, then
(x) = supqIQ (q) + bq (x q).
(c) is Borel measurable.
proof (a) If c, c0 I and c < a < c0 , then a is expressible as dc + (1 d)c0 for some d ]0, 1[, so that
(a) d(c) + (1 d)(c0 ) and

(a)(c) d(c)+(1d)(c0 )(c) (1d)((c0 )(c))


=
ac dc+(1d)c0 c (1d)(c0 c)
d((c0 )(c)) (c0 )d(c)(1d)(c0 ) (c0 )(a)
= = .
d(c0 c) c0 dc(1d)c0 c0 a

This means that


(a)(c)
b = supc<a,cI
ac

(c0 )(a)
is finite, and b whenever a < c0 I; accordingly (x) (a) + b(x a) for every x I.
c0 a

(b) By the choice of the bq , (x) supqQ q (x). On the other hand, given x I, fix y I such that
x < y and let b R be such that (z) (x) + b(z x) for every z I. If q Q and x < q < y, we have
(y) (q) + bq (y q), so that bq (y)(q)
yq and

(y)(q)
(q) + bq (x q) = (q) bq (q x) (q) (q x)
yq
yx qx yx qx
= (q) (y) ((x) + b(q x)) (y).
yq yq yq yq

Now

yx qx
(x) = lim ((x) + b(q x)) (y)
qx yq yq
yx qx
sup ((x) + b(q x)) (y)
qQ]x,y[ yq yq

sup (q) + bq (x q) sup (q) + bq (x q).


qQ]x,y[ qQI

(c) Writing q (x) = (q) + bq (x q) for every q Q I, every q is a Borel measurable function,
and = supqIQ q is the supremum of a countable family of Borel measurable functions, so is Borel
measurable.

233I Jensens inequality Let (X, , ) be a measure space and : R R a convex function.
(a) Suppose that f and g are real-valued -virtually measurable R functions defined almost everywhere
onR X and that
R g a.e. 0 (that is, g 0 almost everywhere), g = 1 and g f is integrable. Then
( g f ) g f , where we may need to interpret the right-hand integral as . R R
(b) In particular, if X = 1 and f is a real-valued function which is integrable over X, then ( f ) f .
proof (a) For each q Q take bq such that (t) q (t) = (q) + bq (t q) for every t R (233Ha). Because
is Borel measurable (233Hc), f is -virtually measurable (121H), so g f also is; since g f isRdefined
almost everywhere and almost everywhere greater than or equal to the integrable function g 0 f , g f
is defined in ], ]. Now
114 The Radon-Nikodym theorem 233I

Z Z

q g f = (q) + bq g f bq q
Z Z Z
= g (bq f + ((q) bq q)X) = g q f g f,
R
because g = 1 and g a.e. 0. By 233Hb,
R R R
( g f ) = supqQ q ( g f ) g f .

(b) Take g to be the constant function with value 1.

233J Even the special case 233Ib of Jensens inequality is already very useful. It can be extended as
follows.
Theorem Let (X, , ) be a probability space and T a -subalgebra of . Let : R R be a convex
function and f a -integrable real-valued function defined almost everywhere in X such that the composition
f is also integrable.
R RIf g and h are conditional expectations on T of f , f respectively, then g a.e. h.
Consequently g f .
proof We use the same ideas as in 233I. For each q Q take a bq R such that (t) q (t) = (q)+bq (tq)
for every t R, so that (t) = supqQ q (t) for every t R. Now setting
q (x) = (q) + bq (g(x) q)
for x dom g, we see that q = q g is a conditional expectation of q f , and as q f a.e. f we must have
q a.e. h. But also g = RsupqQ Rq wherever
R g is defined, so g a.e. h, as claimed.
It follows at once that g h = f .

233K I give the following proposition, an elaboration of 233Eg, in a very general form, as its applications
can turn up anywhere.
Proposition Let (X, , ) be a probability space, and T a -subalgebra of . Suppose that f is a -
integrable function and h is a ( T)-virtually measurable real-valued function defined ( T)-almost every-
where on X. Let g, g0 be conditional expectations of f and |f | on T. Then f h is integrable iff g0 h is
integrable, and in this case g h is a conditional expectation of f h on T.
proof (a) Suppose that h is a T-simple function. Then surely f h and g0 h are integrable, and g h
is a conditional expectation of f h as in 233Eg.
(b) Now suppose that f , h 0. Then g = g0 a.e. 0 (233Ec). Let h be a non-negative T-measurable
function defined everywhere on X such that h =a.e. h. For each n N set

hn (x) = 2n k if 0 k < 4n and 2n k h(x) < 2n (k + 1),


= 2n if h(x) 2n .
Then hn is a ( T)-simple function, so g hn is a conditional expectation of f hn . Both hf hn inN and
hg hn inN are almost everywhere non-decreasing sequences of integrable functions, with limits f h and
g h respectively. By B.Levis theorem,

f h is integrable f h is integrable
Z Z
sup f hn < sup g hn <
nN nN
R R
(because g hn = f hn for each n)
g h is integrable g0 h is integrable.

Moreover, in this case


233Xe Conditional expectations 115

Z Z Z
f h= f h = lim f hn
E E n E
Z Z Z
= lim g hn = g h = gh
n E E E
for every E T, while g h is ( T)-virtually measurable, so g h is a conditional expectation of f h.
(c) Finally, consider the general case of integrable f and virtually measurable h. Set f + = f 0,
f = (f ) 0, so that f = f + f and 0 f + , f |f |; similarly, set h+ = h 0, h = (h) 0. Let
g1 , g2 be conditional expectations of f + , f on T. Because 0 f + , f |f |, 0 g1 , g2 a.e. g0 , while
g =a.e. g1 g2 .
We see that

f h is integrable |f | |h| = |f h| is integrable


g0 |h| is integrable
g0 h is integrable.
+ +
And in this case all four of f h , . . . , f h are integrable, so
(g1 g2 ) h = g1 h+ g2 h+ g1 h + g2 h
is a conditional expectation of
f + h+ f h+ f + h + f h = f h.
Since g h =a.e. (g1 g2 ) h, this also is a conditional expectation of f h, and were done.

233X Basic exercises (a) Let (X, , ) be a probability space and T a -subalgebra of . Let hfn inN
be a sequence of non-negative -integrable functions and suppose that gn is a conditional expectation of fn
on T for each n. Suppose that f = lim inf n fn is integrable and has a conditional expectation g. Show
that g a.e. lim inf n gn .
(b) Let I R be an interval, and : I R a function. Show that is convex iff {x : x I, (x)+bx c}
is an interval for every b, c R.
> (c) Let I R be an open interval and : I R a function. (i) Show that if is differentiable then
it is convex iff 0 is non-decreasing. (ii) Show that if is absolutely continuous on every bounded closed
subinterval of I then is convex iff 0 is non-decreasing on its domain.
(d) For any r 1, a subset C of R r is convex if tx + (1 t)y C for all x, y C and t [0, 1]. If
C R r is convex, then a function : C R is convex if (tx + (1 t)y) t(x) + (1 t)(y) for all x,
y C and t [0, 1].
Let C R r be a convex set and : C R a function. Show that the following are equiveridical:
(i) the function is convex; (ii) the set {(x, t) : x C, t R, t (x)} is convex in R r+1 ; P
(iii) the set
r
{x : x C, (x) + b. x c} is convex in R r for every b R r and c R, writing b. x = i=1 i i if
b = (1 , . . . , r ) and x = (1 , . . . , r ).
(e) Let I R be an interval and : I R a convex function.
(i) Show that if a, d I and a < b c < d then
(b)(a) (d)(c)
.
ba dc
(ii) Show that is continuous at every interior point of I.
(iii) Show that either is monotonic on I or there is a c I such that (c) = minxI (x) and is
non-increasing on I ], c], monotonic non-decreasing on I [c, [.
(iv) Show that is differentiable at all but countably many points of I, and that its derivative is
non-decreasing in the sense that 0 (x) 0 (y) whenever x, y dom 0 and x y.
(v) Show that if I is closed and bounded and is continuous then is absolutely continuous.
(vi) Show that if I is closed and bounded and : I R is absolutely continuous with a non-decreasing
derivative then is convex.
116 The Radon-Nikodym theorem 233Xf

(f ) Show that if I R is an interval and , : I R are convex functions so is a + b for any a, b 0.

(g) In the context of 233K, give an example in which g h is integrable but f h is not. (Hint: take X,
, T as in 233Cb, and arrange for g to be 0.)

(h) Let I R be an interval and a non-empty family of convex real-valued functions on I such that
(x) = sup (x) is finite for every x I. Show that is convex.

233Y Further exercises (a) If I R is an interval, a function : I R is mid-convex if ( x+y


2 )
1
2 ((x)+ (y)) for all x, y I. Show that a mid-convex function which is bounded on any non-trivial
subinterval of I is convex.

(b) Generalize 233Xd to arbitrary normed spaces in place of R r .

(c) Let (X, , ) be a probability space and T a -subalgebra of . Let be a convex real-valued function
with domain an interval I R, and f an integrable real-valued function on X such that f (x) I for almost
every x X and f is integrable. Let g, h be conditional expectations on T of f , f respectively. Show
that g(x) I for almost every x and that g a.e. h.

(d) (i) Show that if I R is a bounded interval, E I is Lebesgue measurable, and E > 23 I where
is Lebesgue measure, then for every x I there are y, z E such that z = x+y 2 . (Hint: by 134Ya/263A,
(x + E) + (2E) > (2I).) (ii) Show that if f : [0, 1] R is a mid-convex Lebesgue measurable function
(definition: 233Ya), a > 0, and E = {x : x [0, 1], a f (x) < 2a} is not negligible, then there is a non-
trivial interval I [0, 1] such that f (x) > 0 for every x I. (Hint: 223B.) (iii) Suppose that f : [0, 1] R
is a mid-convex function such that f 0 almost everywhere on [0, 1]. Show that f 0 everywhere on ]0, 1[.
(Hint: for every x ]0, 1[, max(f (x t), f (x + t)) 0 for almost every t [0, min(x, 1 x)].) (iv) Suppose
that f : [0, 1] R is a mid-convex Lebesgue measurable function such that f (0) = f (1) = 0. Show that
f (x) 0 for every x [0, 1]. (Hint: show that {x : f (x) 0} is dense in [0, 1], use (ii) to show that it is
conegligible in [0, 1] and apply (iii).) (v) Show that if I R is an interval and f : I R is a mid-convex
Lebesgue measurable function then it is convex.

233 Notes and comments The concept of conditional expectation is fundamental in probability theory,
and will reappear in Chapter 27 in its natural context. I hope that even as an exercise in technique, however,
it will strike you as worth taking note of.
I introduced 233E as a list of elementary facts, and they are indeed straightforward. But below the
surface there are some remarkable ideas waiting for expression. If you take T to be the trivial algebra
{,
R X}, so that the (unique) conditional expectation of an integrable function f is the constant function
( f )X, then 233Ed and 233Ee become versions of B.Levis theorem and Lebesgues Dominated Conver-
gence Theorem. R (Fatous
R Lemma is in 233Xa.) Even 233Eg can be thought of as a generalization of the
result that cf = c f , where the constant c has been replaced by a bounded T-measurable function. A
recurrent theme in the later parts of this treatise will be the search for conditioned versions of theorems.
The proof of 233Ee is a typical example of an argument which has been translated from a proof of the
original unconditioned result.
I suggested that 233I-233J are surprising, and I think that most of us find them so, even applied to the
list of convex functions given in 233G. But I should remark that in a way 233J has very little to do with
conditional expectations. The only properties of conditional expectations used in the proof are (i) that if g
is a conditional expectation of f , then aX + bg is a conditional expectation of aX + bf for all real a, b
(ii) if g1 , g2 are conditional expectations of f1 , f2 and f1 a.e. f2 , then g1 a.e. g2 . See 244Xm below.
Note that 233Ib can be regarded as the special case of R233J in which T = {, X}. In fact 233Ia can be
derived from 233Ib applied to the measure where E = E g for every E .
Like 233B, 233K seems to have rather a lot of technical detail in the argument. The point of this result
is that we can deduce the integrability of f h from that of g0 h (but not from the integrability of g h;
see 233Xg). Otherwise it should be routine.
234Cc Indefinite-integral measures 117

234 Indefinite-integral measures


I take a few pages to describe a standard construction. The idea is straightforward (234A-234B), but
a number of details need to be worked out if it is to be securely integrated into the general framework I
employ.

234A Theorem Let (X, , ) be a measure space, and f a non-negative R -virtually measurable real-
valued function defined on a conegligible subset of X. Write F = f F d whenever F X is such
that the integral is defined in [0, ] according to the conventions of 133A. Then is a complete measure on
X, and its domain includes .
R
proof (a) Write T for the domain of , that is, the family of sets F X such that f F d is defined
in [0, ], that is, f F is -virtually measurable (133A). Then T is a -algebra of subsets of X. P P For
each F T let HF X be a -conegligible set such that f F HF is -measurable. Because f itself is
-virtually measurable, X T. If F T, then
f (X \ F )(HX HF ) = f (HX HF ) (f F )(HX HF )
is -measurable, while
T HX HF is -conegligible, so X \ F T. If hFn inN is a sequence in T with
union F , set H = nN HFn ; then H is coneglible, f Fn H is -measurable for every n N, and
f F = supnN f Fn , so f F H is -measurable, and F T. Thus T is a -algebra. If F ,
then f F HX is -measurable, so F T. Q
Q
(b) Next, is a measure. P P Of course F [0, ] for every F T. f P= 0 wherever it is defined,

so = 0. If hFn inN is a disjoint sequence inPT with union F , then f F = n=0 Pf Fn . If Fm =

for some m, then we surely have F = = n=0 Fn . If Fm < for each m but n=0 Fn = , then
R S Pm R
f ( nm Fn ) = n=0 f Fn
P P
as m , so again F = = n=0 Fn . If n=0 Fn < then by B.Levis theorem
R P P R P
F = n=0 f Fn = n=0 f Fn = n=0 Fn . Q
Q

(c) Finally, is complete. P


P If A F T and F = 0, then f F =a.e. 0, so f A =a.e. 0 and A
is defined and equal to zero. Q
Q

234B Definition Let (X, , ) be a measure space, and another measure on X with domain T. I will
call an indefinite-integral measure over , or sometimes a completed indefinite-integral measure,
if it can be obtained by the method of 234A from some non-negative virtually measurable function defined
almost everywhere on X. In this case I will call f a Radon-Nikodym derivative of with respect to .
As in 232Hf, the phrase density function is also used in this context.

234C Remarks Let (X, , ) be a measure space, and f a -virtually measurable non-negative real-
valued function defined almost everywhere on X; let be the associated indefinite-integral measure.

(a) There is a -measurable function g : X [0, [ such that f =a.e. g. P P Let H dom f be a
measurable conegligible set such
R that f H is Rmeasurable, and set g(x) = f (x) for x H, g(x) = 0 for
x X \ H. Q Q In this case, f E d = g E d if either is defined. So g is a Radon-Nikodym
derivative of , and has a Radon-Nikodym derivative which is -measurable and defined everywhere.

(b) If E is -negligible, then f E = 0 -a.e., so E = 0. But it does not follow that is absolutely
continuous with respect to in the - sense of 232Aa (234Xa).

(c) I have defined indefinite-integral measure in such a way as to produce a complete measure. In
my view this is what makes best sense in most applications. There are occasions
R on
R which it seems more
appropriate to use the measure 0 : [0, ] defined by setting 0 E = E f d = f E d for E .
I suppose I would call this the uncompleted indefinite-integral measure over defined by f . ( is
always the completion of 0 ; see 234Db.)
118 The Radon-Nikodym theorem 234Cd
R
(d) Note the wayR in which I formulated the definition of : E = f E d if the integral is defined,
rather than E = E f d. The point is that theRlonger formula gives a rule for deciding what the domain
of must be. Of course it is the case that E = E f d for every E dom (apply 214F to f E).

(e) Many authors are prepared to say is absolutely continuous with respect to in this context. If
is an indefinite-integral measure with respect to , then it is certainly true that E = 0 whenever E = 0,
as in 232Ba. If is not totally finite, however, it does not follow that limE0 E = 0, as required by the
definition in 232Aa, and further difficulties can arise if or is not -finite (see 234Ya-234Yc).

234D The domain of an indefinite-integral measure It is sometimes useful to have an explicit


description of the domain of a measure constructed in this way.
Proposition Let (X, , ) be a measure space, f a non-negative -virtually measurable function defined
almost everywhere in X, and the associated indefinite-integral measure. Set G = {x : x dom f, f (x) >
0}, and let be the domain of the completion of . Then
(a) the domain T of is {E : E X, E G }; in particular, T .
(b) is the completion of its restriction to .
(c) A set A X is -negligible iff A G is -negligible.
(d) In particular, if itself is complete, then T = {E : E X, E G } and A = 0 iff (A G) = 0.
proof (a)(i) If E T, then f E is virtually measurable, so there is a conegligible measurable set
H dom f such that f EH is measurable. Now E G H = {x : x H, (f E)(x) > 0} must
belong to , while E G \ H is negligible, so belongs to , and E G .
(ii) If E G , let F1 , F2 be such that F1 E G F2 and F2 \ F1 is negligible. Let
H dom f be a conegligible set such that f H is measurable. Then H 0 = H \ (F2 \ F1 ) is conegligible and
f EH 0 = f F1 H 0 is measurable, so f E is virtually measurable and E T.
(b) Thus the given formula does indeed describe T. If E T, let F1 , F2 be such that F1 E G F2
and (F2 \ F1 ) = 0. Because G itself also belongs to , there are G1 , G2 such that G1 G G2 and
(G2 \G1 ) = 0. Set F20 = F2 (X\G1 ). Then F20 and F1 E F20 . But (F20 \F1 )G (G2 \G1 )(F2 \F1 )
is negligible, so (F20 \ F1 ) = 0.
This shows that if is the completion of and T is its domain, then T T. But as is complete, it
surely extends , so = , as claimed.
(c) Now take any A X. Because is complete,

A is -negligible
A = 0
Z
f A d = 0

f A = 0 -a.e.
A G is -negligible.

(d) This is just a restatement of (a) and (c) when = .

234E Corollary If (X, , ) is a complete measure space and G , then the indefinite-integral
measure over defined by G is just the measure G defined by setting
( G)(F ) = (F G) whenever F X and F G .

proof 234Dd.

*234F The next two results will not be relied on in this volume, but I include them for future reference,
and to give an idea of the scope of these ideas.
*234F Indefinite-integral measures 119

Proposition Let (X, , ) be a measure space, and an indefinite-integral measure over .


(a) If is semi-finite, so is .
(b) If is complete and locally determined, so is .
(c) If is localizable, so is .
(d) If is strictly localizable, so is .
(e) If is -finite, so is .
(f) If is atomless, so is .
proof By 234Ca, we may express as the indefinite integral of a -measurable function f : X [0, [.
Let T be the domain of , and the domain of the completion of ; set G = {x : x X, f (x) > 0} .
(a) Suppose that E T and that E = . Then E G cannot be -negligible. BecauseS is semi-
finite, there is a non-negligible F such that F E G and F1 < . Now F = nN {x : x F ,
2n f (x) n}, so there is an n N such that F 0 = {x : x F , 2n f (x) n} is non-negligible.
Because f is measurable, F 0 T and 2n F 0 F 0 nF 0 . Thus we have found an F 0 E such
that 0 < F 0 < . As E is arbitrary, is semi-finite.
(b) We already know that is complete (234Db) and semi-finite. Now suppose that E X is such that
E F T, that is, E F G (234Dd), whenever F T and F < . Then E G F whenever
F and F < . P P Set Fn = {x : x F S G, f (x) n}. Then Fn nF < , so E G Fn
for every n. But this means that E G F = nN E G Fn . Q Q Because is locally determined,
E G and E T. As E is arbitrary, is locally determined.
(c) Let F T be any set. Set E = {F G : F F }, so that E . By 212Ga, is localizable, so E has
an essential supremum H . But now, for any H 0 T, H 0 (X \ G) = (H 0 G) (X \ G) belongs to ,
so

(F \ H 0 ) = 0 for every F F
(F G \ H 0 ) = 0 for every F F
(E \ H 0 ) = 0 for every E E
(E \ (H 0 (X \ G))) = 0 for every E E
(H \ ((H 0 (X \ G))) = 0
(H G \ H 0 ) = 0
(H \ H 0 ) = 0.
So H is also an essential supremum of F in T. As F is arbitrary, is localizable.
(d) Let hXi iiI be a decomposition of X for ; then it is also a decomposition for (212Gb). Set
F0 = X \ G, Fn = {x : x G, n 1 < f (x) n} for n 1. Then hXi Fn iiI,nN is a decomposition for
. P P (i) hXi iiI and hFn inN are disjoint covers of X by members of T, so hXi Fn iiI,nN also is.
(ii) (Xi F0 ) = 0, (Xi Fn ) nXi < for i I, n 1. (iii) If E X and E Xi Fn T for every
S
i I and n N then E Xi G = nN E Xi Fn G belongs to for every i, so E G and E T.
(iv) If E T, then of course
P P
iI,nN (E Xi Fn ) = supJIN is finite (i,n)J (E Xi Fn ) E.
P
So if iI,nN (E Xi Fn ) = it is surely equal to E. If the sum is finite, then K = {i : i
R
I, (E Xi ) > 0} must be countable. But for i I \ K, EXi f d = 0, so f = 0 -a.e. on E Xi ,
S
that is, (E G Xi ) = 0. Because hXi iiI is a decomposition for , (E G iI\K Xi ) = 0 and
S
(E iI\K Xi ) = 0. But this means that
P P P
E = iK (E Xi ) = iK,nN (E Xi Fn ) = iI,nN (E Xi Fn ).
As E is arbitrary, hXi Fn iiI,nN is a decomposition for . Q
Q So is strictly localizable.
(e) If is -finite, then in (d) we can take I to be countable, so that I N is also countable, and will
be -finite.
120 The Radon-Nikodym theorem *234F

(f ) If is atomless, so is (212Gd). If E T and E > 0, then (E G) > 0, so there isR an F


such that F R E G and neither F nor E G \ F is -negligible. But in this case both F = F f d and
(E \ F ) = E\F f d must be greater than 0 (122Rc). As E is arbitrary, is atomless.

*234G For localizable measures, there is a straightforward description of the associated indefinite-
integral measures.
Theorem Let (X, , ) be a localizable measure space. Then a measure , with domain T , is an
indefinite-integral measure over iff () is semi-finite and zero on -negligible sets () is the completion
of its restriction to () whenever E > 0 there is an F E such that F and F < and F > 0.
proof (a) If is an indefinite-integral measure over , then by 234Fa, 234Cb and 234Db it is semi-finite,
zero on -negligible sets and the completion of its restriction to . Now suppose that E T and E > 0.
Then there is an E0 such that E0 E and E0 = E, by 234Db. If f : X R is a -measurable
Radon-Nikodym derivative of , and G = {x : f (x) > 0}, then (G E0 ) > 0; because is semi-finite, there
is an F such that F G E0 and 0 < F < , in which case F > 0.
(b) So now suppose that satisfies the conditions.
(i) Set E = {E : E , E < }. For each E E, consider E : R, setting E G = (G E)
for every G . Then E is countably additive and truly continuous with respect to . P P E is countably
additive, just as in 231De. Because is zero on -negligible sets, E must be absolutely continuous with
respect to , by 232Ba. Since E clearly satisfies condition () of 232Bb,
R it must be truly continuous. Q
Q
By 232E, there is a -integrable function fE such that E G = G fE d for every G , and we may
suppose that fE is -measurable (232He). Because E is non-negative, fE 0 -almost everywhere.
(ii) If E, F E then fE = fF -a.e. on E F , because
R R
f d = G =
G E G
fF d
whenever G and G E F . Because (X, , ) is localizable, there is a measurable f : X R such
that fE = f -a.e. on E for every E E (213N). Because every fE is non-negative almost everywhere, we
may suppose that f is non-negative, since surely fE = f 0 -a.e. on E for every E E.
(iii) Let be the indefinite-integral measure defined by f . If E E then
R R
E = f d =
E E E
f d = E.
For E \ E, we have
E sup{F : F E, F E} = sup{F : F E, F E} = E =
because is semi-finite. Thus and agree on . But since each is the completion of its restriction to ,
they must be equal.

234X Basic exercises (a) Let be Lebesgue measure on [0, 1], and set f (x) = x1 for x > 0. Let be
the associated indefinite-integral
measure. Show that the domain of is equal to the domain of . Show
that for every 0, 12 there is a measurable set E such that E = but E = 1 .
(b) Let (X, , ) be a measure space, and an indefinite-integral measure over . Show that if is purely
atomic, so is .
(c) Let be a point-supported measure. Show that any indefinite-integral measure over is point-
supported.

234Y Further exercises (a) Let (X, , ) be a semi-finite measure space which is not localizable.
Show that there is a measure
R : [0, ] such that E E for every E but there is no measurable
function f such that E = E f d for every E .
(b) Let (X, , ) be a localizable measure space with locally determined negligible sets. Show that a
measure , with domain T , is an indefinite-integral measure over iff () is complete and semi-finite
and zero on -negligible sets () whenever E > 0 there is an F E such that F and F < and
F > 0.
235A Measurable transformations 121

(c) Give an example of a localizable measure space (X, , ) and a complete semi-finite measure on
X, defined on a -algebra T , zero on -negligible sets, and such that whenever E > 0 there is an
F E such that F and F < and F > 0, but is not an indefinite-integral measure over . (Hint:
216Yb.)

(d) Let (X, , ) be an atomless semi-finite measure space and an indefinite-integral measure over .
Show that the following are equiveridical: (i) for every > 0 there is a > 0 such that E whenever
E (ii) has a Radon-Nikodym derivative expressible as the sum of a bounded function and an integrable
function.

(e) Let (X, , ) be a localizable measure space, and a complete localizable measure on X, with domain
T , which is the completion of its restriction to . Show that if we set 1 F = sup{(F E) : E ,
E < } for every F T, then 1 is an indefinite-integral measure over , and there is a -conegligible set
H such that 1 F = (F H) for every F T.

(f ) Let (X, , ) be a measure space and an indefinite-integral measure over , with Radon-Nikodym
derivative f . Show that the c.l.d. version of is the indefinite-integral measure defined by f over the c.l.d.
version of .

234 Notes and comments I have taken this section very carefully because the ideas I wish to express here,
in so far as they extend the work of 232, rely critically on the details of the definition in 234A, and it is easy
to make a false step once we have left the relatively sheltered context of complete -finite measures. I believe
that if we take a little trouble at this point we can develop a theory (234C-234F) which will offer a smooth
path to later applications; to see what I have in mind, you can refer to the entries under indefinite-integral
measure in the index. For the moment I mention only a kind of Radon-Nikodym theorem for localizable
measures (234G).

235 Measurable transformations


I turn now to a topic which is separate from the Radon-Nikodym theorem, but which seems to fit better
here than in either of the next two chapters. I seek to give results which will generalize the basic formula of
calculus
R R
g(y)dy = g((x))0 (x)dx
in the context of a general transformation between measure spaces. The principal results are I suppose
235A/235E, which are very similar expressions of the basic idea, and 235L, which gives a general criterion
for a stronger result. A formulation from a different direction is in 235T.

235A I start with the basic result, which is already sufficient for a large proportion of the applications
I have in mind.
Theorem Let (X, , ) and (Y, T, ) be measure spaces, and : D Y , J : DJ [0, [ functions
defined on conegligible subsets D , DJ of X such that
R
J (1 [F ])d exists = F
whenever F T and F < . Then
R R
1 [H]
J g d exists = H
g d
for every -integrable function g taking values in [, ] and every H T, provided that we interpret
(J g)(x) as 0 when J(x) = 0 and g((x)) is undefined.
Pn
proof (a) If g is a simple function, say g = i=0 ai Fi where Fi < for each i, then
R Pn R Pn R
J g d = i=0 ai J (1 [Fi ]) d = i=0 ai Fi = g d.
122 The Radon-Nikodym theorem 235A
R
(b) If F = 0 then J (1 [F ]) = 0 so J = 0 a.e. on 1 [F ]. So if g is defined -a.e., J = 0 -a.e.
on X \ dom(g) = (X \ D ) 1 [Y \ dom g], and, on the convention proposed, J g is defined -a.e.
Moreover, if limn gn = g -a.e., then limn J gn = J g -a.e. So if hgn inN is a non-decreasing
sequence of simple functions converging almost everywhere to g, hJ gn inN will be an non-decreasing
sequence of integrable functions converging almost everywhere to J g; by B.Levis theorem,
R R R R
J g d exists = limn J gn d = limn gn d = g d.

(c) If g = g + g , where g + and g are -integrable functions, then


R R R R R R
J g d = J g + d J g d = g + d g d = g d.

(d) This deals with the case H = Y . For the general case, we have

Z Z
g d = (g H)d
H
(131Fa)
Z Z Z
1
= J (g H) d = J g ( [H])d = J g d
1 [H]

by 214F.

235B Remarks (a) Note the particular convention


0 undefined = 0
which I am applying to the interpretation of J g. This is the first of a number of technical points
which will concern us in this section. The point is that if g is defined -almost everywhere, then for any
extension of g to a function g1 : Y R we shall, on this convention, have J g = J g1 except on
{x : J(x) > 0, (x) Y \ dom g}, which is negligible; so that
R R R R
J g d = J g1 d = g1 d = g d
if g and g1 are integrable. Thus the convention is appropriate here, and while it adds a phrase to the
statements of many of the results of this section, it makes their application smoother. (But I ought to insist
that I am using this as a local convention only, and the ordinary rule 0 undefined = undefined will stand
elsewhere in this treatise unless explicitly overruled.)

(b) I have had to take care in the formulation of this theorem to distinguish between the hypothesis
R
J(x)(1 [F ])(x)(dx) exists = F whenever F <
and the perhaps more elegant alternative
R
1 [F ]
J(x)(dx) exists = F whenever F < ,
R R
which is not quite adequate for the theorem. (See 235S below.) Recall R that by A f I mean (f A)dA ,
where
R A is the subspace measure on A (214D). It is possible for A (f A)dA to be defined even when
f A d is not; for instance, take to be Lebesgue
R measure on [0, 1], A any non-measurable subset of

[0, 1], and f the constantRfunction with value 1; then A f =R A, but f A = A is not -integrable. It is
however the case that if f A d is defined, then so is A f , and the two are equal; this is a consequence
of 214F. While 235R shows that in most of the cases relevant to the present volume the distinction can be
passed over, it is important to avoid assuming that 1 [F ] is measurable for every F T. A simple example
is the following. Set X = Y = [0, 1]. Let be Lebesgue measure on [0, 1], and define by setting
T = {F : F [0, 1], F [0, 21 ] is Lebesgue measurable},

F = 2(F [0, 21 ]) for every F T.


Set (x) = x for every x [0, 1]. Then we have
235E Measurable transformations 123
R R
F = F
J (1 [F ])d
J d =

for every F T, where J(x) = 2 for x [0, 12 ] and J(x) = 0 for x 12 , 1 . But of course there are subsets
F of [ 21 , 1] which are not Lebesgue measurable (see 134D), and such an F necessarily belongs to T, even
though 1 [F ] does not belong to the domain of .
The point here is that if F0 = 0 then we expect to have J = 0 on 1 [F0 ], and it is of no importance
whether 1 [F ] is measurable for F F0 .
R
235C Theorem 235A is concerned with integration, and accordingly the hypothesis J (1 [F ])d =
F looks only at sets F of finite measure. If we wish to consider measurability of non-integrable functions,
we need a slightly stronger hypothesis. I approach this version more gently, with a couple of lemmas.
Lemma Let , T be -algebras of subsets of X and Y respectively. Suppose that D X and that
: D Y is a function such that 1 [F ] D , the subspace -algebra, for every F T. Then g is
-measurable for every [, ]-valued T-measurable function g defined on a subset of Y .
proof Set C = dom g and B = dom g = 1 [C]. If a R, then there is an F T such that {y : g(y)
a} = F C. Now there is an E such that 1 [F ] = E D. So
{x : g(x) a} = B E B .
As a is arbitrary, g is -measurable.

235D Some of the results below are easier when we can move freely between measure spaces and their
completions (212C). The next lemma is what we need.
Lemma Let (X, , ) and (Y, T, ) be measure spaces, with completions (X, , ) and (Y, T, ). Let :
D Y ,RJ : DJ [0, [ be functions defined on conegligible subsets ofR X.
(a) If J (1 [F ])d = F whenever F T and F < , then J (1 [F ])d = F whenever
F T and
R F < . R
(b) If J (1 [F ])d = F whenever F T, then J (1 [F ])d = F whenever F T.
R
proof Both rely on the fact that either hypothesis is enough to ensure that J (1 [F ])d = 0 whenever
F = 0. Accordingly, if F is -negligible, so that there is an F 0 T such that F F 0 and F 0 = 0, we shall
have
R R
J (1 [F ])d = J (1 [F 0 ])d = 0.
But now, given any F T, there is an F0 T such that F0 F and (F \ F0 ) = 0, so that
Z Z
J ( [F ])d = J (1 [F ])d
1

Z Z
= J (1 [F0 ])d + J (1 [F \ F0 ])d

= F0 = F,
provided (for part (a)) that F < .
Remark Thus if we have the hypotheses of any of the principal results of this section valid for a pair of
non-complete measure spaces, we can expect to be able to draw some conclusion by applying the result to
the completions of the given spaces.

235E Now I come to the alternative version of 235A.


Theorem Let (X, , ) and (Y, T, ) be measure spaces, and : D Y , J : DJ [0, [ two functions
defined on conegligible subsets of X such that
R
J (1 [F ])d = F
for every F T, allowing as a value of the integral.
124 The Radon-Nikodym theorem 235E

(a) J g is -virtually measurable for every -virtually measurable function g defined on a subset of Y .
(b) RLet g be a -virtually
R measurable [, ]-valued function defined on a conegligible subset of Y .
Then J g d = g d whenever either integral is defined in [, ], if we interpret (J g)(x) as 0
when J(x) = 0 and g((x)) is undefined.
proof Let (X, , ) and (Y, T, ) be the completions of (X, , ) and (Y, T, ). By 235D,
R
J (1 [F ])d = F
for every F T. R Recalling
R that a real-valued function is -virtually measurable iff it is -measurable
(212Fa), and that f d = f d if either is defined in [, ] (212Fb), the conclusions we are seeking are
(a)0 J g is -measurable for every T-measurable function g defined on a subset of Y ;
R R
(b)0 J g d = g d whenever g is a T-measurable function defined almost everywhere
in Y and either integral is defined in [, ].
(a) When I write
R R
J D d = J (1 [Y ])d = Y ,
which is part of the hypothesis of this theorem, I mean to imply that J D is -virtually measurable,
that is, is -measurable. Because D is conegligible, it follows that J is -measurable, and its domain DJ ,
being conegligible, also belongs to . Set G = {x : x DJ , J(x) > 0} . Then for any set A X, J A
is -measurable iff A G . So the hypothesis is just that G 1 [F ] for every F T.
Now let g be a [, ]-valued function, defined on a subset C of Y , which is T-measurable. Applying
235C to G, we see that gG is -measurable, so (J g)G is -measurable. On the other hand, J g
is zero almost everywhere on X \ G, so (because G ) J g is -measurable, as required.
R
(b)(i) Suppose first that g 0. Then J g 0, so (a) tells us that J g is defined in [0, ].
R R R
() If g d < then J g d = g d by 235A.
) If there is some > 0 such that H = , where H = {y : g(y) }, then
(
R R
J g d J (1 [H])d = H = ,
so
R R
J g d = = g d.

( ) Otherwise,
Z Z
J g d sup{ J h d : h is -integrable, 0 h g}
Z Z
= sup{ h d : h is -integrable, 0 h g} = g d = ,
R R
so once again J d = g d.
(ii) For general real-valued g, apply (i) to g + and g where g + = 12 (|g| + g), g = 21 (|g| g); the point
is that (J g)+ = J g + and (J g) = J g , so that
R R R R R R
J g = J g+ J g = g+ g = g
if either side is defined in [, ].

235F Remarks (a) Of course there are two special cases of this theorem which between them carry all
its content: the case J 1 and the case in which X = Y and is the identity function. If J = X we are
very close to 235I below, and if is the identity function we are close to the indefinite-integral measures of
234.

(b) As in 235A, we can strengthen the conclusion of (b) in 235E to


235J Measurable transformations 125
R R
1 [F ]
J g d = F
g d
R
whenever F T and F
g d is defined in [, ].

235G Inverse-measure-preserving functions It is high time that I introduced the nearest thing in
measure theory to a morphism. If (X, , ) and (Y, T, ) are measure spaces, a function : X Y is
inverse-measure-preserving if 1 [F ] and 1 [F ] = F for every F T.

235H Proposition (a) Let (X, , ), (Y, T, ) and (Z, , ) be measure spaces, and : X Y ,
: Y Z two inverse-measure-preserving maps. Then : X Z is inverse-measure-preserving.
(b) Let (X, , ) be a measure space and T a -subalgebra of . Then the identity map is an inverse-
measure-preserving function from (X, , ) to (X, T, T).
(c) Let (X, , ) and (Y, T, ) be measure spaces with completions (X, , ) and (Y, T, ). Let be an
inverse-measure-preserving function from (X, , ) to (Y, T, ). Then is also inverse-measure-preserving
from (X, , ) to (Y, T, ).
proof (a) For any W ,
()1 [W ] = 1 [ 1 [W ]] = 1 [W ] = W .

(b) is surely obvious.


(c) When we say that is inverse-measure-preserving for and , we are saying that the image measure
1 extends ; of course it follows that the image measure 1 extends . By 212Bd, 1 is complete
measure, so must extend ; that is, is inverse-measure-preserving for and .
Remark Of course (c) here can also be deduced from 235Db.

235I Theorem Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-
preserving function. Then
(a) if g is a -virtually measurable [, ]-valued function defined on a subset of Y , g is -virtually
measurable;
R (b) if gR is a -virtually measurable [, ]-valued function defined on a conegligible subset of Y ,
g d = g d if either integral is defined in [, ];
(c) if g is Ra -virtually measurable
R [, ]-valued function defined on a conegligible subset of Y , and
F T, then 1 [F ] g d = F g d if either integral is defined in [, ].

proof (a) This follows immediately from 235Hc and 235C; in the notation of 235Hc, 1 [F ] for every
F T, so if g is T-measurable then g will be -measurable.
(b) Apply 235E with J = X; we have
R
J (1 [F ])d = 1 [F ] = F
for every F T, so
R R R
g = J g = g
if either integral is defined in [, ].
(c) Apply (b) to g F .

235J The image measure catastrophe Applications of 235A would run much more smoothly if we
could sayR R
g d exists and is equal to J g d for every g : Y R such that J g is -integrable.
Unhappily there is no hope of a universally applicable result in this direction. Suppose, for instance, that
is Lebesgue measure on Y = [0, 1], that X [0, 1] is a non-Lebesgue-measurable set of outer measure 1
(134D), that is the subspace measure X on X, and that (x) = x for x X. Then
1 F = (X F ) = F
126 The Radon-Nikodym theorem 235J

for every Lebesgue measurable set F Y , so we can take J =R X and the hypotheses of 235A R and 235E
will be satisfied. But if we write g = X : [0, 1] {0, 1}, then g d is defined even though g d is not.
The point here is that there is a set A Y such that (in the language of 235A/235E) 1 [A] but
A / T. This is the image measure catastrophe. The search for contexts in which we can be sure that
it does not occur will be one of the motive themes of Volume 4. For the moment, I will offer some general
remarks (235K-235L), and describe one of the important cases in which the problem does not arise (235M).

235K Lemma Let , T be -algebras of subsets of X, Y respectively, and a function from a subset
D of X to Y . Suppose that G X and that
T = {F : F Y, G 1 [F ] }.
Then a real-valued function g, defined on a member of T, is T-measurable iff G g is -measurable.
proof Because surely Y T, the hypothesis implies that G D = G 1 [Y ] belongs to .
Let g : C R be a function, where C T. Set B = dom(g) = 1 [C], and for a R set Fa = {y :
g(y) a},
Ea = G 1 [Fa ] = {x : x G B, g(x) a}.
Note that G B because C T.
(i) If g is T-measurable, then Fa T and Ea for every a. Now
G {x : x B, g(x) a} = G 1 [Fa ] = Ea ,
so {x : x B, (G g)(x) a} is either Ea or Ea (B \ G), and in either case is relatively -measurable
in B. As a is arbitrary, G g is -measurable.
(ii) If G g is -measurable, then, for any a R,
Ea = {x : x G B, (G g)(x) a}
because G B and G g is -measurable. So Fa T. As a is arbitrary, g is T-measurable.

235L Theorem Let (X, , ) and (Y, T, ) be complete measure spaces. Let : D Y , J : DJ
[0, [ be functions defined on conegligible subsets of X, and set G = {x : x DJ , J(x) > 0}. Suppose that
T = {F : F Y, G 1 [F ] },
R
F = J (1 [F ])d for every F T.
R R
Then, for any real-valued function g defined on a subset of Y , J g d = g d whenever either integral
is defined in [, ], provided that we interpret (J g)(x) as 0 when J(x) = 0 and g((x)) is undefined.
proof If g is T-measurable and defined almost everywhere, this is a consequence of 235E. So I have to show
that if J g is measurable and defined almost everywhere, so is g. Set W = Y \ dom g. Then J g is
undefined on G 1 [W ], because g is undefined there and we cannot take advantage of the escape clause
available when J = 0; so G 1 [W ] must be negligible, therefore measurable, and W T. Next,
R
W = J (1 [W ]) = 0
because J (1 [W ]) can be non-zero only on the negligible set G 1 [W ]. So g is defined almost
everywhere.
Note that the hypothesis surely implies that J D = J (1 [Y ]) is measurable, so that J is
measurable (because D is conegligible) and G . Writing K(x) = 1/J(x) for x G, 0 for x X \ G,
the function K : X R is measurable, and
G g = K J g
is measurable. So 235K tells us that g must be measurable, and were done.
Remark When J = X, the hypothesis of this theorem becomes
T = {F : F Y, 1 [F ] }, F = 1 [F ] for every F T;
that is, is the image measure 1 as defined in 112E.
235O Measurable transformations 127

235M Corollary Let (X, , ) be a complete measure space, and J a non-negative measurable function
defined on a conegligible subset of X. Let be the associated indefinite-integral measure, and T its domain.
ThenR for any Rreal-valued function g defined on a subset of X, g is T-measurable iff J g is -measurable,
and g d = J g d if either integral is defined in [, ], provided that we interpret (J g)(x) as 0
when J(x) = 0 and g(x) is undefined.
proof Put 235L, taking Y = X and the identity function, together with 234Dd.

235N Applying the Radon-Nikodym theorem In order to use 235A-235L effectively, we need to
be able to find suitable functions J. This can be difficult some very special examples will take up most of
Chapter 26 below. But there are many circumstances in which we can be sure that such J exist, even if we do
not
R know what they are. A minimal requirement is that if F < and 1 [F ] = 0 then F = 0, because
1
J ( [F ])d will be zero for any J. A sufficient condition, in the special case of indefinite-integral
measures, is in 234G. Another is the following.

235O Theorem Let (X, , ) be a -finite measure space, (Y, T, ) a semi-finite measure space, and
: D Y a function such that
(i) D is a conegligible subset of X,
(ii) 1 [F ] for every F T;
(iii) 1 [F ] > 0 whenever F T and F > 0. R
Then there is a -measurable function J : X [0, [ such that J 1 [F ] d = F for every F T.
proof (a) To begin with (down to the end of (c) below) let us suppose that D = X and that is totally
finite.
Set T = {1 [F ] : F T} . Then T is a -algebra of subsets of X. P
P (i)
= 1 [] T.
(ii) If E T, take F T such that E = 1 [F ], so that
X \ E = 1 [Y \ F ] T.
(iii) If hEn inN is any sequence in T, then for each n N choose Fn T such that En = 1 [Fn ]; then
S 1
S
nN En = [ nN Fn ] T. Q
Q
Next, we have a totally finite measure : T [0, Y ] given by setting
(1 [F ]) = F for every F T.
P (i) If F , F 0 T and 1 [F ] = 1 [F 0 ], then 1 [F 4F 0 ] = , so (1 [F 4F 0 ]) = 0 and (F 4F 0 ) = 0;
P
consequently F = F 0 . This shows that is well-defined. (ii) Now
= (1 []) = = 0.
(iii) If hEn inN isSa disjoint sequence in T, let hFn inN be a sequence in T such that En = 1 [Fn ] for each
n; set Fn0 = Fn \ m<n Fm for each n; then En = 1 [Fn0 ] for each n, so
S S S P P
( nN En ) = (1 [ nN Fn0 ]) = ( nN Fn0 ) = n=0 Fn0 = n=0 En . Q Q
Finally, observe that if E > 0 then E > 0, because E = 1 [F ] where F > 0.
R
(b) By 215B(ix) there is a -measurable function h : X ]0, [ such that h d is finite. Define
R
: T [0, [ by setting E = E h d for every E T; then is a totally finite measure. If E T
and E = 0, then (because h is strictly positive) E = 0 and E = 0. Accordingly we may apply the
RRadon-Nikodym theorem to and to see that there is a T-measurable function g : X R such that
E
g d = E for every E T. Because is non-negative, we may suppose that g 0.
(c) Applying 235A to , , h and the identity function from X to itself, we see that
R R
E
g h d = E
g d = E
for every E T, that is, that
128 The Radon-Nikodym theorem 235O
R
J (1 [F ])d = F
for every F T, writing J = g h.
(d) This completes the proof when is totally finite and D = X. For the general case, if Y = then
X = 0 and the result is trivial. Otherwise, let be any extension of to a function from X to Y which is
constant on X \ D; then 1 [F ] for every F T, because D = 1 [Y ] and 1 [F ] is always either
1 [F ] or (X \ D) 1 [F ]. Now must be -finite. P P Use the criterion of 215B(ii). If F is a disjoint
family in {F : F T, 0 < F < }, then E = {1 [F ] : F F} is a disjoint family in {E : E > 0}, so E
and F are countable. Q Q
Let hYn inN be a partition of Y into sets of finite -measure, and for each n N set n F = (F Yn ) for
every F T. Then n is a totally finite measure on Y , and if n F > 0 then F > 0 so
1 [F ] = 1 [F ] > 0.
Accordingly , and n satisfy the assumptions of the theorem together with those of (a) above, and there
is a -measurable function Jn : X [0, [ such that
R
n F = Jn (1 [F ])d
P
for every F T. Now set J = n=0 Jn (1 [Yn ]), so that J : X [0, [ is -measurable. If F T,
then
Z Z
X
1
J ( [F ])d = Jn (1 [Yn ]) (1 [F ])d
n=0
X Z
X
= Jn (1 [F Yn ])d = (F Yn ) = F,
n=0 n=0

as required.

235P Remark Theorem 235O can fail if is only strictly localizable rather than -finite. P
P Let X = Y
be an uncountable set, = PX, counting measure on X (112Bd), T the countable-cocountable -algebra
of Y , the countable-cocountable measure on Y (211R), : X Y the identity map. Then 1 [F ]
and 1 [F ] > 0 whenever F > 0. But if J is any -integrable function on X, then F = {x : J(x) 6= 0} is
countable and
R
(Y \ F ) = 1 6= 0 = 1 [Y \F ]
J d. Q
Q

*235Q There are some simplifications in the case of -finite spaces; in particular, 235A and 235E
become conflated. I will give an adaptation of the hypotheses of 235A which may be used in the -finite
case. First a lemma.
Lemma
R LetR(X, , )Rbe a measure space and f a non-negative integrable function on X. If A X is such
that A f + X\A f = f , then f A is integrable.

proof By 214Eb, there are -integrable functions f1 , f2 such that f1 extends f A, f2 extends f X \ A, and
R R R R
f = EA f ,
E 1
f = E\A f
E 2
R R
for every E . Because f is non-negative, E f1 and E f2 are non-negative for every E , and f1 , f2
are non-negative a.e. Accordingly we have f A a.e. f1 and f (X \ A) a.e. f2 , so that f a.e. f1 + f2 .
But also
R R R R R R
f1 + f2 = X
f1 + X
f2 = A
f+ X\A
f= f,
so f =a.e. f1 + f2 . Accordingly
f1 =a.e. f f2 a.e. f f (X \ A) = f A a.e. f1
and f A =a.e. f1 is integrable.
*235S Measurable transformations 129

*235R Proposition Let (X, , ) be a complete measure space and (Y, T, ) a complete -finite measure
space. SupposeR that : D Y , J : DJ [0, [ are functions defined on conegligible subsets D , DJ of
X such that 1 [F ] J d exists and is equal to F whenever F T and F < .
(a) J g is -measurable for every T-measurable real-valued function g defined onR a subset of Y R.
(b) If g is a T-measurable real-valued function defined almost everywhere in Y , then J g d = g d
whenever either integral is defined in [, ], interpreting (J g)(x) as 0 when J(x) = 0, g((x)) is
undefined.
proof The point is that the hypotheses of 235E are satisfied. To see this, let us write C = {E C : E }
and C = C for the subspace measure on C, for each C X. Let hYn inN be a non-decreasing sequence
of sets with union Y and with Yn < for every n N, starting from Y0 = .
(i) Take any F T with F < , and set Fn = F Yn for each n N; write Cn = 1 [Fn ].
Fix n for the moment. Then our hypothesis implies that
R R R
C0
J d + Cn \C0
J d = F + (Fn \ F ) = Fn = Cn
J d.

If we regard the subspace measures on C0 and Cn \ C0 as derived from the measure Cn of Cn (214Ce), then
235Q tells us that J C0 is Cn -integrable, and there is a -integrable function hn such that hn extends
(J C0 )Cn .
Let E be a -conegligible set, included in the domain D of , such
S that hn E is -measurable for every
n. Because hCn inN is a non-decreasing sequence with union 1 [ nN Fn ] = D ,
(J C0 )(x) = limn hn (x)
for every x E, and (J C0 )E is measurable. At the same time, we know that there is a -integrable h
extending JC0 , and 0 a.e. J C0 a.e. |h|. Accordingly J C0 is integrable, and (using 214F)
R R R
J 1 [F ] d = J C0 d = C0
JC0 dC0 = F .

(ii) This deals with F of finite measure. For general F T,


R R
J (1 [F ]) d = limn J (1 [F Yn ]) d = limn (F Yn ) = F .
So the hypotheses of 235E are satisfied, and the result follows at once.

R*235S I remarked in 235Bb that a difficulty


R can arise in 235A, for general measure spaces, if we speak
of 1 [F ]
J d in the hypothesis, in place of J (1 [F ])d. Here is an example.

Example Set X = Y = [0, 2]. Write L for the algebra of Lebesgue measurable subsets of R, and L for
Lebesgue measure; write c for counting measure on R. Set
= T = {E : E [0, 2], E [0, 1[ L };
of course this is a -algebra of subsets of [0, 2]. For E = T, set
E = E = L (E [0, 1[) + c (E [1, 2]);
then is a complete measure in effect, it is the direct sum of Lebesgue measure on [0, 1[ and counting
measure on [1, 2] (see 214K). It is easy to see that
B = L (B [0, 1[) + c (B [1, 2])
for every B [0, 2]. Let A [0, 1[ be a non-Lebesgue-measurable set such that L (E \ A) = L E for every
Lebesgue measurable E [0, 1[ (see 134D). Define : [0, 2] [0, 2] by setting (x) = x + 1 if x A,
(x) = x if x [0, 2] \ A.
If F , then (1 [F ]) = F . PP (i) If F [1, 2] is finite, then F = L (F [0, 1]) + #(F [1, 2]).
Now
1 [F ] = (F [0, 1[ \ A) (F [1, 2]) {x : x A, x + 1 F };
as the last set is finite, therefore -negligible,
(1 [F ]) = L (F [0, 1[ \ A) + #(F [1, 2]) = L (F [0, 1[) + #(F [1, 2]) = F .
130 The Radon-Nikodym theorem *235S

(ii) If F [1, 2] is infinite, so is 1 [F ] [1, 2], so


(1 [F ]) = = F . Q
Q
This means that if we set J(x) = 1 for every x [0, 2],
R
1 [F ]
J d = 1 [F ] (1 [F ]) = (1 [F ]) = F

for every F , and


R , J satisfy the amended hypotheses for 235A. But if we set g = [0, 1[, then g is
-integrable, with g d = 1, while
J(x)g((x)) = 1 if x [0, 1] \ A, 0 otherwise,
so, because A
/ , J g is not measurable, and therefore (since is complete) not -integrable.

235T Reversing the burden Throughout the work above, I have been using the formula
R R
J g = g,
as being the natural extension of the formula
R R
g= g 0
of ordinary advanced calculus. But we can also move the derivative J to the other side of the equation, as
follows.
Theorem
R Let (X, , ), (Y, T, ) be measure spaces and : X Y , J : YR [0, [Rfunctions such that
F
J d and 1 [F ] are defined in [0, ] and equal for every F T. Then g d = J g d whenever
g is -virtually measurable and defined -almost everywhere and either integral is defined in [, ].
proof Let 1 be the indefinite-integral measure over defined by J, R and the completion of . Then is
P If F T, then 1 F = F J d = 1 [F ]; that is, is inverse-
inverse-measure-preserving for and 1 . P
measure-preserving for and 1 T. Since 1 is the completion of 1 T (234Db), is inverse-measure-
preserving for and 1 (235Hc). Q Q
Of course we can also regard 1 as being an indefinite-integral measure over the completion of (212Fb).
So if g is -virtually measurable and defined -almost everywhere,
R R R R R
J g d = J g d = g d1 = g d = g d
if any of the five integrals is defined in [, ], by 235M, 235Ib and 212Fb again.

235X Basic exercises (a) Explain what 235A tells us when X = Y , T = , is the identity function
and E = E for every E .

(b) Let (X, , ) be a measure space, J an integrable non-negative real-valued function on X, and
: D R a measurable function, where D is a conegligible subset of X. Set
R
g(a) = {x:(x)a}
J
R R
for a R, and let g be the Lebesgue-Stieltjes measure associated with g. Show that J f d = f dg
for every g -integrable real function f .

(c) Let , T and be -algebras of subsets of X, Y and Z respectively. Let us say that a function
: A Y , where A X, is (, T)-measurable if 1 [F ] A , the subspace -algebra of A, for every
F T. Suppose that A X, B Y , : A Y is (, T)-measurable and : B Z is (T, )-measurable.
Show that is (, )-measurable. Deduce 235C.

(d) Let (X, , ) be a measure space and (Y, T, ) a semi-finite measure space. LetR : D Y and
J : DJ [0, [ be functions defined on conegligible subsets D , DJ of X such that J (1 [F ])d
exists = F whenever F T and F < . Let g be a T-measurable real-valued function, defined on a
conegligible subset of Y . Show that J g is -integrable iff g is -integrable, and the integrals are then
equal, provided we interpret (J g)(x) as 0 when J(x) = 0 and g((x)) is undefined.
235Yc Measurable transformations 131

> (e) Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-preserving function.
(i) Show that is totally finite, or a probability measure, iff is. (ii) Show that if is -finite, so is . (iii)
Show that if is semi-finite and is -finite, then is -finite. (Hint: use 215B(iii).) (iv) Show that if is
purely atomic and is semi-finite then is purely atomic. (vi) Show that if is -finite and atomless then
is atomless.
(f ) Let (X, , ) be a measure space and E . Define a measure E on X by setting ( E)(F ) =
(E F ) Rwhenever F RX is such that F E . Show that, for any function f from a subset of X to
[, ], f d( E) = E f d if either is defined in [, ].
> (g) Let g : R R be a non-decreasing function which is absolutely continuous on every closed bounded
interval, and let g be the associated Lebesgue-Stieltjes measure
R (114Xa,
R 225Xf). Write for Lebesgue
measure on R, and let f : R R be a function. Show that f g 0 d = f dg in the sense that if one of
the integrals exists, finite or infinite, so does the other, and they are then equal.
(h) Let g : R R be a non-decreasing function and J a non-negative R real-valued g -integrable function,
where g is the Lebesgue-Stieltjes measure defined from g. Set h(x) = ],x] J dg for each x R, and let
R R
h be the Lebesgue-Stieltjes measure associated with h. Show that, for any f : R R, f J dg = f dh ,
in the sense that if one of the integrals is defined in [, ] so is the other, and they are then equal.
> (i) Let X be a set and , , three measures on X such that is an indefinite-integral measure over
, with Radon-Nikodym derivative f , and is an indefinite-integral measure over , with Radon-Nikodym
derivative g. Show that is an indefinite-integral measure over , and that f g is a Radon-Nikodym
derivative of with respect to , provided we interpret (f g)(x) as 0 when f (x) = 0 and g(x) is undefined.
R
(j) In 235O, if is not semi-finite, show that we can still find a J such that 1 [F ] J d = F for every
set F of finite measure. (Hint: use the semi-finite version of , as described in 213Xc.)
(k) Let (X, , ) be a -finite measure space, and T a -subalgebra of . Let : T R be a countably
additive functional such
R that F = 0 whenever F T and F = 0. Show that there is a -integrable
function f such that F f d = F for every F T. (Hint: use the method of 235O, applied to the positive
and negative parts of .)
(l) Let (X, , ) and (Y, T, ) be measure spaces, with completions (X, , ) and (Y, T,R ). Let : D
Y , J : DJ [0, [ be functions defined on conegligible subsets of X. Show that if 1 [F ] J d = F
R
whenever F T and F < , then 1 [F ] J d = F whenever F T and F < . Hence, or otherwise,
show that 235Rb is valid for non-complete spaces (X, , ) and (Y, T, ).

235Y Further exercises (a) Write for Lebesgue measure on Y = [0, 1], and T for its domain. Let
A [0, 1] be a set such that A = ([0, 1] \ A) = 1, and set X = [0, 1] {x + 1 : x A} {x + 2 : x
[0, 1] \ A}. Let LX be the subspace measure induced on X by Lebesgue measure, and set E = 31 LX E
for E = dom LX . Define : X Y by writing (x) = x if x [0, 1], (x) = x 1 if x X ]1, 2]
and (x) = x 2 if x X ]2, 3]. Show that T = {F : F Y, 1 [F ] }, that F = 1 [F ] for every
F T, but that A > 1 [A].
(b) Write T for the algebra of Borel subsets of Y = [0, 1], and for the restriction of Lebesgue measure
to T. Let A [0, 1] be a set such that both A and [0, 1] \ A have Lebesgue outer measure 1, and set
X = A [1, 2]. Let be the algebra of relatively Borel subsets of X, and set E = A (A E) for E ,
where A is the subspace measure induced on A by Lebesgue measure. Define : X Y by setting
(x) = x if x A, x 1 if x X \ A. Show that T = {F : F Y, 1 [F ] } and that is inverse-
measure-preserving, but that, setting g = ([0, 1] \ A), g is -integrable while g is not -integrable.
(c) Let (X, ,
R ) be a probability space and T a -subalgebra of . Let f be a non-negative -integrable
function with f d = 1, so that its indefinite-integral measure is a probability measure. Let g be a
-integrable real-valued function and set h = f g, intepreting h(x) as 0 if f (x) = 0 and g(x) is undefined.
Let f1 , h1 be conditional expectations of f , h on T with respect to the measure , and set g1 = h1 /f1 ,
interpreting g1 (x) as 0 if h1 (x) = 0 and f1 (x) is either 0 or undefined. Show that g1 is a conditional
expectation of g on T with respect to the measure .
132 The Radon-Nikodym theorem 235 Notes

235 Notes and comments I see that I have taken up a great deal of space in this section with technicalities;
the hypotheses of the theorems vary erratically, with completeness, in particular, being invoked at apparently
arbitrary intervals, and ideas repeat themselves in a haphazard pattern. There is nothing deep, and most
of the work consists in laboriously verifying details. The trouble with this topic is that it is useful. The
results here are abstract expressions of integration-by-substitution; they have applications all over measure
theory. I cannot therefore content myself with theorems which will elegantly express the underlying ideas,
but must seek formulations which I can quote in later arguments.
I hope that the examples in 235Bb, 235J, 235P, 235S, 235Ya and 235Yb will go some way to persuade
you that there are real traps for the unwary, and that the careful verifications written out at such length
are necessary. On the other hand, it is happily the case that in simple contexts, in which the measures ,
are -finite and the transformations are Borel isomorphisms, no insuperable difficulties arise, and in
particular the image measure catastrophe does not trouble us. But for further work in this direction I refer
you to the applications in 263, 265 and 271, and to Volume 4.
One of the striking features of measure theory, compared with other comparably abstract branches of pure
mathematics, is the relative unimportance of any notion of morphism. The theory of groups, for instance,
is dominated by the concept of homomorphism, and general topology gives a similar place to continuous
function. In my view, the nearest equivalent in measure theory is the idea of inverse-measure-preserving
function (235G). I mean in Volumes 3 and 4 to explore this concept more thoroughly. In this volume I will
content myself with signalling such functions when they arise, and with the basic facts listed in 235H-235I.
241B L0 and L0 133

Chapter 24
Function spaces
The extraordinary power of Lebesgues theory of integration is perhaps best demonstrated by its ability
to provide structures relevant to questions quite different from those to which it was at first addressed.
In this chapter I give the constructions, and elementary properties, of some of the fundamental spaces of
functional analysis.
I do not feel called on here to justify the study of normed spaces; if you have not met them before, I
hope that the introduction here will show at least that they offer a basis for a remarkable fusion of algebra
and analysis. The fragments of the theory of metric spaces, normed spaces and general topology which
we shall need are sketched in 2A2-2A5. The principal function spaces described in this chapter in fact
combine three structural elements: they are (infinite-dimensional) linear spaces, they are metric spaces, with
associated concepts of continuity and convergence, and they are ordered spaces, with corresponding notions
of supremum and infimum. The interactions between these three types of structure provide an inexhaustible
wealth of ideas. Furthermore, many of these ideas are directly applicable to a wide variety of problems in
more or less applied mathematics, particularly in differential and integral equations, but more generally in
any system with infinitely many degrees of freedom.
I have laid out the chapter with sections on L0 (the space of equivalence classes of all real-valued mea-
surable functions, in which all the other spaces of the chapter are embedded), L1 (equivalence classes of
integrable functions), L (equivalence classes of bounded measurable functions) and Lp (equivalence classes
of pth-power-integrable functions). While ordinary functional analysis gives much more attention to the
Banach spaces Lp for 1 p than to L0 , from the special point of view of this book the space L0 is at
least as important and interesting as any of the others. Following these four sections, I return to a study
of the standard topology on L0 , the topology of convergence in measure (245), and then to two linked
sections on uniform integrability and weak compactness in L1 (246-247).
There is a technical point here which must never be lost sight of. While it is customary and natural to call
L1 , L2 and the others function spaces, their elements are not in fact functions, but equivalence classes of
functions. As you see from the language of the preceding paragraph, my practice is to scrupulously maintain
the distinction; I give my reasons in the notes to 241.

241 L0 and L0
The chief aim of this chapter is to discuss the spaces L1 , L and Lp of the following three sections.
However it will be convenient to regard all these as subspaces of a larger space L0 of equivalence classes of
(virtually) measurable functions, and I have collected in this section the basic facts concerning the ordered
linear space L0 .
It is almost the first principle of measure theory that sets of measure zero can often be ignored; the
phrase negligible set itself asserts this principle. Accordingly, two functions which agree almost everywhere
may often (not always!) be treated as identical. A suitable expression of this idea is to form the space
of equivalence classes of functions, saying that two functions are equivalent if they agree on a conegligible
set. This is the basis of all the constructions of this chapter. It is a remarkable fact that the spaces
of equivalence classes so constructed are actually better adapted to certain problems than the spaces of
functions from which they are derived, so that once the technique has been mastered it is easier to do ones
thinking in the more abstract spaces.

241A The space L0 : Definition It is time to give a name to a set of functions which has already been
used more than once. Let (X, , ) be any measure space. I write L0 , or L0 (), for the space of real-valued
functions f defined on conegligible subsets of X which are virtually measurable, that is, such that f E is
measurable for some conegligible set E X. Recall that f is -virtually measurable iff it is measurable for
the completion of (212Fa).
134 Function spaces 241B

241B Basic properties If (X, , ) is any measure space, then we have the following facts, correspond-
ing to the fundamental properties of measurable functions listed in 121 of Volume 1. I work through them
in order, so that if you have Volume 1 to hand you can see what has to be missed out.
(a) A constant real-valued function defined almost everywhere in X belongs to L0 (121Ea).
(b) f + g L0 for all f , g L0 (for if f F and gG are measurable, then (f + g)(F G) = (f F ) + (gG)
is measurable)(121Eb).
(c) cf L0 for all f L0 , c R (121Ec).
(d) f g L0 for all f , g L0 (121Ed).
(e) If f L0 and h : R R is Borel measurable, then hf L0 (121Eg).
(f) If hfn inN is a sequence in L0 and f = limn fn is defined (as a real-valued function) almost
everywhere in X, then f L0 (121Fa).
(g) If hfn inN is a sequence in L0 and f = supnN fn is defined (as a real-valued function) almost
everywhere in X, then f L0 (121Fb).
(h) If hfn inN is a sequence in L0 and f = inf nN fn is defined (as a real-valued function) almost
everywhere in X, then f L0 (121Fc).
(i) If hfn inN is a sequence in L0 and f = lim supnN fn is defined (as a real-valued function) almost
everywhere in X, then f L0 (121Fd).
(j) If hfn inN is a sequence in L0 and f = lim inf nN fn is defined (as a real-valued function) almost
everywhere in X, then f L0 (121Fe).
(k) L0 is just the set of real-valued functions, defined on subsets of X, which are equal almost everywhere
to some -measurable function from X to R. P P (i) If g : X R is -measurable and f =a.e. g, then
F = {x : x dom f, f (x) = g(x)} is conegligible and f F = gF is measurable (121Eh), so f L0 . (ii) If
f L0 , let E X be a conegligible set such that f E is measurable. Then D = E dom f is conegligible
and f D is measurable, so there is a measurable h : X R agreeing with f on D (121I); and h agrees with
f almost everywhere. Q Q

241C The space L0 : Definition Let (X, , ) be any measure space. Then =a.e. is an equivalence
relation on L0 . Write L0 , or L0 (), for the set of equivalence classes in L0 under =a.e. . For f L0 , write
f for its equivalence class in L0 .

241D The linear structure of L0 Let (X, , ) be any measure space, and set L0 = L0 (), L0 =
0
L ().

(a) If f1 , f2 , g1 , g2 L0 and f1 =a.e. f2 , g1 =a.e. g2 then f1 + g1 =a.e. f2 + g2 . Accordingly we may define


addition on L0 by setting f + g = (f + g) for all f , g L0 .

(b) If f1 , f2 L0 and f1 =a.e. f2 , then cf1 =a.e. cf2 for every c R. Accordingly we may define scalar
multiplication on L0 by setting c.f = (cf ) for all f L0 , c R.

(c) Now L0 is a linear space over R, with zero 0 , where 0 is the function with domain X and constant
value 0, and negatives f = (f ) . P
P (i)
f + (g + h) = (f + g) + h for all f , g, h L0 ,
so
u + (v + w) = (u + v) + w for all u, v, w L0 .
(ii)
f + 0 = 0 + f = f for every f L0 ,
so
u + 0 = 0 + u = u for every u L0 .
(iii)
f + (f ) =a.e. 0 for every f L0 ,
241Ed L0 and L0 135

so
f + (f ) = 0 for every f L0 .
(iv)
f + g = g + f for all f , g L0 ,
so
u + v = v + u for all u, v L0 .
(v)
c(f + g) = cf + cg for all f , g L0 and c R,
so
c(u + v) = cu + cv for all u, v L0 and c R.
(vi)
(a + b)f = af + bf for all f L0 , a, b R,
so
(a + b)u = au + bu for all u L0 , a, b R.
(vii)
(ab)f = a(bf ) for all f L0 , a, b R,
so
(ab)u = a(bu) for all u L0 , a, b R.
(viii)
1f = f for all f L0 ,
so
1u = u for all u L0 . Q
Q

241E The order structure of L0 Let (X, , ) be any measure space and set L0 = L0 (), L0 = L0 ().

(a) If f1 , f2 , g1 , g2 L0 , f1 =a.e. f2 , g1 =a.e. g2 and f1 a.e. g1 , then f2 a.e. g2 . Accordingly we may


define a relation on L0 by saying that f g iff f a.e. g.

(b) Now is a partial ordering on L0 . P P (i) If f , g, h L0 and f a.e. g and g a.e. h, then f a.e. h.
Accordingly u w whenever u, v, w L and u v, v w. (ii) If f L0 then f a.e. f ; so u u for
0

every u L0 . (iii) If f , g L0 and f a.e. g and g a.e. f , then f =a.e. g, so if u v and v u then u = v.
Q
Q

(c) In fact L0 , with , is a partially ordered linear space, that is, a (real) linear space with a partial
ordering such that
if u v then u + w v + w for every w,
if 0 u then 0 cu for every c 0.
P (i) If f , g, h L0 and f a.e. g, then f + h a.e. g + h. (ii) If f L0 and f a.e. 0, then cf a.e. 0 for
P
every c 0. Q Q

(d) More: L0 is a Riesz space or vector lattice, that is, a partially ordered linear space such that
u v = sup{u, v}, u v = inf{u, v} are defined for all u, v L0 . P
P Take f , g L0 such that f = u,
0
g = v. Then f g, f g L , writing

(f g)(x) = max(f (x), g(x)), (f g)(x) = min(f (x), g(x))


for x dom f dom g. (Compare 241Bg-h.) Now, for any h L0 , we have
136 Function spaces 241Ed

f g a.e. h f a.e. h and g a.e. h,

h a.e. f g h a.e. f and h a.e. g,


so for any w L0 we have
(f g) w u w and v w,

w (f g) w u and w v.
Thus we have
(f g) = sup{u, v} = u v, (f g) = inf{u, v} = u v
in L0 . Q
Q

(e) In particular, for any u L0 we can speak of |u| = u (u); if f L0 then |f | = |f | .


If f , g L0 , c R then
1
|cf | = |c||f |, f g = (f + g + |f g|),
2

1
f g = (f + g |f g|), |f + g| a.e. |f | + |g|,
2
so
1
|cu| = |c||u|, u v = (u + v + |u v|),
2

1
u v = (u + v |u v|), |u + v| |u| + |v|
2

for all u, v L0 .

(f ) A special notation is often useful. If f is a real-valued function, set f + (x) = max(f (x), 0), f (x) =
max(f (x), 0) for x dom f , so that
f = f + f , |f | = f + + f = f + f ,
all these functions being defined on dom f . In L0 , the corresponding operations are u+ = u0, u = (u)0,
and we have
u = u+ u , |u| = u+ + u = u+ u , u+ u = 0.

(g) It is perhaps obvious, but I say it anyway: if u 0 in L0 , then there is an f 0 in L0 such that
f = u. P

P Take any g L0 such that u = g , and set f = g 0. Q
Q

241F Riesz spaces There is an extensive abstract theory of Riesz spaces, which I think it best to leave
aside for the moment; a general account may be found in Luxemburg & Zaanen 71 and Zaanen 83; my
own book Fremlin 74 covers the elementary material, and Chapter 35 in the next volume repeats the most
essential ideas. For our purposes here we need only a few definitions and some simple results which are most
easily proved for the special cases in which we need them, without reference to the general theory.

(a) A Riesz space U is Archimedean if whenever u U , u > 0 (that is, u 0 and u 6= 0), and v U ,
there is an n N such that nu 6 v.

(b) A Riesz space U is Dedekind -complete (or -order-complete, or -complete) if every non-
empty countable set A U which is bounded above has a least upper bound in U .

(c) A Riesz space U is Dedekind complete (or order complete, or complete) if every non-empty set
A U which is bounded above in U has a least upper bound in U .
241G L0 and L0 137

241G Now we have the following important properties of L0 .


Theorem Let (X, , ) be a measure space. Set L0 = L0 ().
(a) L0 is Archimedean and Dedekind -complete.
(b) If (X, , ) is semi-finite, then L0 is Dedekind complete iff (X, , ) is localizable.
proof Set L0 = L0 ().
(a)(i) If u, v L0 and u > 0, express u as f and v as g where f , g L0 . Then E = {x : x
dom f, f (x) > 0} is not negligible. So there is an n N such that
En = {x : x dom f dom g, nf (x) > g(x)}
S
is not negligible, since E dom g nN En . But now nu 6 v. As u and v are arbitrary, L0 is Archimedean.
(ii) Now let A L0 be a non-empty countable set with an upper bound w in L0 . Express A as
{fn : n N} where hfn inN is a sequence in L0 ,Tand w as h where h L0 . Set f = supnN fn . Then we
have f (x) defined in R at any point x dom h nN dom fn such that fn (x) h(x) for every n N, that
is, for almost every x X; so f L0 (241Bg). Set u = f L0 . If v L0 , say v = g where g L0 , then

un v for every n N
for every n N, fn a.e. g
for almost every x X, fn (x) g(x) for every n N
f a.e. g u v.
Thus u = supnN un in L0 . As A is arbitrary, L0 is Dedekind -complete.
(b)(i) Suppose that (X, , ) is localizable. Let A L0 be any non-empty set with an upper bound
w0 L0 . Set
A = {f : f is a measurable function from X to R, f A},
then every member of A is of the form f for some f A (241Bk). For each q Q, let Eq be the family
of subsets of X expressible in the form {x : f (x) q} for some f A; then Eq . Because (X, , ) is
localizable, there is a set Fq which is an essential supremum for Eq . For x X, set
g (x) = sup{q : q Q, x Fq },
allowing as the supremum of a set which is not bounded above, and as sup . Then
S
{x : g (x) > a} = qQ,q>a Fq
for every a R.
If f A, then f a.e. g . P
P For each q Q, set
Eq = {x : f (x) q} Eq ;
S
then Eq \ Fq is negligible. Set H = qQ (Eq \ Fq ). If x X \ H, then
f (x) q = g (x) q,
so f (x) g (x); thus f a.e. g . Q
Q
If h : X R is measurable and u h for every u A, then g a.e. h. P P Set Gq = {x : h(x) q}
for each q Q. If E Eq , there is an f A such that E = {x : f (x) q}; now f a.e. h, so
E \ Gq {x : f (x) > h(x)} is negligible. Because Fq is an essential supremum for Eq , Fq \ Gq is negligible;
and this is true for every q Q. Consequently
S
{x : h(x) < g (x)} qQ Fq \ Gq
is negligible, and g a.e. h. Q
Q
Now recall that we are assuming that A 6= and that A has an upper bound w0 L0 . Take any f0 A
and a measurable h0 : X R such that h0 = w0 ; then f a.e. h0 for every f A, so f0 a.e. g a.e. h0 ,
and g must be finite a.e. Setting g(x) = g (x) when g (x) R, we have g L0 and g =a.e. g , so that
f a.e. g a.e. h
138 Function spaces 241G

whenever f , h are measurable functions from X to R, f A and h is an upper bound for A; that is,
u g w
whenever u A and w is an upper bound for A. But this means that g is a least upper bound for A in L0 .
As A is arbitrary, L0 is Dedekind complete.
(ii) Suppose that L0 is Dedekind complete. We are assuming that (X, , ) is semi-finite. Let E be any
subset of . Set
A = {0} {(E) : E E} L0 .
Then A is bounded above by (X) so has a least upper bound w L0 . Express w as h where h : X R
is measurable, and set F = {x : h(x) > 0}. Then F is an essential supremum for E in . P P () If E E,
then (E) w so E a.e. h, that is, h(x) 1 for almost every x E, and E \ F {x : x E, h(x) < 1}
is negligible. () If G and E \ G is negligible for every E E, then E a.e. G for every E E, that is,
(E) (G) for every E E; so w (G) , that is, h a.e. G. Accordingly F \G {x : h(x) > (G)(x)}
is negligible. Q
Q
As E is arbitrary, (X, , ) is localizable.

241H The multiplicative structure of L0 Let (X, , ) be any measure space; write L0 = L0 (),
L = L0 ().
0

(a) If f1 , f2 , g1 , g2 L0 and f1 =a.e. f2 , g1 =a.e. g2 then f1 g1 =a.e. f2 g2 . Accordingly we may define


multiplication on L0 by setting f g = (f g) for all f , g L0 .

(b) It is now easy to check that, for all u, v, w L0 and c R,


u (v w) = (u v) w,
u 1 = 1 u = u, where 1 is the function with constant value 1,
c(u v) = cu v = u cv,
u (v + w) = (u v) + (u w),
(u + v) w = (u w) + (v w),
u v = v u,
|u v| = |u| |v|,
u v = 0 iff |u| |v| = 0,
|u| |v| iff there is a w such that |w| 1 and u = v w.

241I The action of Borel functions on L0 Let (X, , ) be a measure space and h : R R a Borel
measurable function. Then hf L0 = L0 () for every f L0 (241Be) and hf =a.e. hg whenever f =a.e. g.
So we have a function h : L0 L0 defined by setting h(f ) = (hf ) for every f L0 . For instance, if
u L0 and p 1, we can consider |u|p = h(u) where h(x) = |x|p for x R.

241J Complex L0 The ideas of this chapter, like those of Chapters 22-23, are often applied to spaces
based on complex-valued functions instead of real-valued functions. Let (X, , ) be a measure space.

(a) We may write L0C = L0C () for the space of complex-valued functions f such that dom f is a conegligible
subset of X and there is a conegligible subset E X such that f E is measurable; that is, such that the
real and imaginary parts of f both belong to L0 (). Next, L0C = L0C () will be the space of equivalence
classes in L0C under the equivalence relation =a.e. .

(b) Using just the same formulae as in 241D, it is easy to describe addition and scalar multiplication
rendering L0C a linear space over C. We no longer have quite the same kind of order structure, but we can
identify a real part, being
{f : f L0C is real a.e.},
obviously identifiable with the real linear space L0 , and corresponding maps u 7 Re(u), u 7 Im(u) : L0C
L0 such that u = Re(u) + i Im(u) for every u. Moreover, we have a notion of modulus, writing
241Y L0 and L0 139

|f | = |f | L0 for every f L0C ,


satisfying the basic relations |cu| = |c||u|, |u + v| |u| + |v| for u, v L0C and c C, as in 241Ef. We do of
course still have a multiplication on L0C , for which all the formulae in 241H are still valid.
(c) The following fact is useful. For any u L0C , |u| is the supremum in L0 of {Re(u) : C, || = 1}.
P (i) If || = 1, then Re(u) |u| = |u|. So |u| is an upper bound of {Re(u) : || = 1}. (ii) If v L0 and
P
Re(u) v whenever || = 1, then express u, v as f , g where f : X C and g : X R are measurable.
For any q Q, x X set fq (x) = Re(eiqx f (x)). Then fq a.e. g. Accordingly H = {x : fq (x) g(x) for
every q Q} is conegligible. But of course H = {x : |f (x)| g(x)}, so |f | a.e. g and |u| v. As v is
arbitrary, |u| is the least upper bound of {Re(u) : || = 1}. Q Q

241X Basic exercises >(a) Let X be a set, and let be counting measure on X (112Bd). Show that
L () can be identified with L0 () = RX .
0

> (b) Let (X, , ) be a measure space and the completion of (212C). Show that L0 () = L0 (),
L () = L0 ().
0

(c) Let (X, , ) be a measure space. (i) Show that for every u L0 () we may define an outer measure
u : PR [0, ] by writing u (A) = f 1 [A] whenever A R and f L0 () is such that f = u. (ii)
Show that every Borel subset of R is measurable for the measure defined from u by Caratheodorys method.
> (d) Let (X, , ) be a measure space. Suppose that r 1 and that h : R r R is a Borel measurable
function. Show that there is a function h : L0 ()r L0 () defined by writing
h(f1 , . . . , fr ) = (h(f1 , . . . , fr ))
for f1 , . . . , fr L0 ().
(e) Let U be a Dedekind -complete Riesz space and A U a non-empty countable set which is bounded
below in U . Show that inf A is defined in U .
(f ) Let U be a Dedekind complete Riesz space and A U a non-empty set which is bounded below in
U . Show that inf A is defined in U .
(g) Let h(Xi , i , i )iiI be a family of measure spaces, with direct sum (X, , ) (214K). (i) Writing
i : Xi X for the canonical maps (in the construction
Q of 214K, i (x) = (x, i) for x Xi ), show that
f 7 hf i iiI is a bijection between L0 () and iI L0 (i ). (ii) Show that this corresponds to a bijection
Q
between L0 () and iI L0 (i ).
(h) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function.
(i) Show that we have a map T : L0 () L0 () defined by setting T g = (g) for every g L0 (). (ii)
Show that T is linear, that T (v w) = T v T w for all v, w L0 (), and that T (supnN vn ) = supnN T vn
whenever hvn inN is a sequence in L0 () with an upper bound in L0 ().
(i) Let (X, , ) be a measure space and g, h, hgn inN Borel measurable functions from R to itself; write
g, h, gn for the corresponding functions from L0 = L0 () to itself (241I). (i) Show that
g(u) + h(u) = g + h(u), g(u) h(u) = g h(u), g(h(u)) = gh(u)
for every u L . (ii) Show that if g(t) h(t) for every t R, then g(u) h(u) for every u L0 . (iii) Show
0

that if g is non-decreasing, then g(u) g(v) whenever u v in L0 . (iv) Show that if h(t) = supnN gn (t)
for every t R, then h(u) = supnN gn (u) in L0 for every u L0 .

241Y Further exercises (a) Let (X, , ) be a measure space and the c.l.d. version of (213E). (i)
Show that L0 () L0 (). (ii) Show that this inclusion defines a linear operator T : L0 () L0 () such
that T (u v) = T u T v for all u, v L0 (). (iii) Show that whenever v > 0 in L0 () there is a u 0
in L0 () such that 0 < T u v. (iv) Show that T (sup A) = sup T [A] whenever A L0 () is a non-empty
set with a least upper bound in L0 (). (v) Show that T is injective iff is semi-finite. (vi) Show that if
is localizable, then T is an isomorphism for the linear and order structures of L0 () and L0 (). (Hint:
213Hb.)
140 Function spaces 241Yb

(b) Show that any Dedekind -complete Riesz space is Archimedean.

(c) Let U be any Riesz space. For u U write |u| = u (u), u+ = u 0, u = (u) 0. Show that,
for any u, v U ,
u = u+ u , |u| = u+ + u = u+ u , u+ u = 0,

1
u v = (u + v + |u v|) = u + (v u)+ ,
2

1
u v = (u + v |u v|) = u (u v)+ ,
2

|u + v| |u| + |v|.

(d) A Riesz space U is said to have the countable sup property if for every A U with a least upper
bound in U , there is a countable B A such that sup B = sup A. Show that if (X, , ) is a semi-finite
measure space, then it is -finite iff L0 () has the countable sup property.

(e) Let (X, , ) be a measure space and Y any subset of X; let Y be the subspace measure on Y . (i)
Show that L0 (Y ) = {f Y : f L0 ()}. (ii) Show that there is a canonical surjection T : L0 () L0 (Y )
defined by setting T (f ) = (f Y ) for every f L0 (), which is linear and multiplicative and preserves
finite suprema and infima, so that (in particular) T (|u|) = |T u| for every u L0 ().

(f ) Suppose, in 241Ye, that Y . Explain how L0 (Y ) may be identified (as ordered linear space) with
the subspace
{u : u (X \ Y ) = 0}
of L0 ().

(g) Let U be a partially ordered linear space and N a linear subspace of U such that whenever u, u0 N
and u0 v u then v N . (i) Show that the linear space quotient U/N is a partially ordered linear space
if we say that u v in U/N iff there is a w N such that u v + w in U . (ii) Show that in this case
U/N is a Riesz space if U is a Riesz space and |u| N for every u N .

(h) Let (X, , ) be a measure space. Write L0strict for the space of all measurable functions from X to R,
and N for the subspace of L0strict consisting of measurable functions which are zero almost everywhere. (i)
Show that L0strict is a Dedekind -complete Riesz space. (ii) Show that L0 () can be identified, as ordered
linear space, with the quotient L0strict /N as defined in 241Yg above.

(i) Let (X, , ) be a measure space, and h : R R a non-decreasing function which is continuous on the
left. Show that if A L0 = L0 () is a non-empty set with a supremum v L0 , then h(v) = supuA h(u),
where h : L0 L0 is the function described in 241I.

241 Notes and comments As hinted in 241Yb-241Yc, the elementary properties of the space L0 which
take up most of this section are strongly interdependent; it is not difficult to develop a theory of Riesz
algebras to incorporate the ideas of 241H into the rest. (Indeed, I sketch such a theory in 352 in the next
volume.)
If we write L0strict for the space of measurable functions from X to R, then L0strict is also a Dedekind -
complete Riesz space, and L0 can be identified with the quotient L0strict /N, writing N for the set of functions
in L0strict which are zero almost everywhere. (To do this properly, we need a theory of quotients of ordered
linear spaces; see 241Yg-241Yh above.) Of course L0 , as I define it, is not quite a linear space. I choose the
slightly more awkward description of L0 as a space of equivalence classes in L0 rather than in L0strict because
it frequently happens in practice that a member of L0 arises from a member of L0 which is either not defined
at every point of the underlying space, or not quite measurable; and to adjust such a function so that it
becomes a member of L0strict , while trivial, is an arbitrary process which to my mind is liable to distort
242 intro. L1 141

the true nature of such a construction. Of course the same argument could be used in favour of a slightly
larger space, the space L0 of -virtually measurable [, ]-valued functions defined and finite almost
everywhere, relying on 135E rather than on 121E-121F. But I maintain that the operation of restricting a
function in L0 to the set on which it is finite is not arbitrary, but canonical and entirely natural.
Reading the exposition above or, for that matter, scanning the rest of this chapter you are sure to
notice a plethora of s, adding a distinctive character to the pages which, I expect you will feel, is disagreeable
to the eye and daunting, or at any rate wearisome, to the spirit. Many, perhaps most, authors prefer to
simplify the typography by using the same symbol for a function in L0 or L0strict and for its equivalence
class in L0 ; and indeed it is common to use syntax which does not distinguish between them either, so that
an object which has been defined as a member of L0 will suddenly become a function with actual values
at points of the underlying measure space. I prefer to maintain a rigid distinction; you must choose for
yourself whether to follow me. Since I have chosen the more cumbersome form, I suppose the burden of
proof is on me, to justify my decision. (i) Anyone would agree that there is at least a formal difference
between a function and a set of functions. This by itself does not justify insisting on the difference in every
sentence; mathematical exposition would be impossible if we always insisted on consistency in such questions
as whether (for instance) the number 3 belonging to the set N of natural numbers is exactly the same object
as the number 3 belonging to the set C of complex numbers, or the ordinal 3. But the difference between
an object and a set to which it belongs is a sufficient difference in kind to make any confusion extremely
dangerous, and while I agree that you can study this topic without using different symbols for f and f , I
do not think you can ever safely escape a mental distinction for more than a few lines of argument. (ii) As a
teacher, I have to say that quite a few students, encountering this material for the first time, are misled by
any failure to make the distinction between f and f into believing that no distinction need be made; and
as a teacher I always insist on a student convincing me, by correctly writing out the more pedantic forms
of the arguments for a few weeks, that he understands the manipulations necessary, before I allow him to
go his own way. (iii) The reason why it is possible to evade the distinction in certain types of argument is
just that the Dedekind -complete Riesz space L0strict parallels the Dedekind -complete Riesz space L0 so
closely that any proposition involving only countably many members of these spaces is likely to be valid in
one if and only if it is valid in the other. In my view, the implications of this correspondence are at the
very heart of measure theory. I prefer therefore to keep it constantly conspicuous, reminding myself through
symbolism that every theorem has a Siamese twin, and rising to each challenge to express the twin theorem
in an appropriate language. (iv) There are ways in which L0strict and L0 are actually very different, and
many interesting ideas can be expressed only in a language which keeps them clearly separated.
For more than half my life now I have felt that these points between them are sufficient reason for being
consistent in maintaining the formal distinction between f and f . You may feel that in (iii) and (iv) of
the last paragraph I am trying to have things both ways; I am arguing that both the similarities and the
differences between L0 and L0 support my case. Indeed that is exactly my position. If they were totally
different, using the same language for both would not give rise to confusion; if they were essentially the
same, it would not matter if we were sometimes unclear which we were talking about.

242 L1
While the space L0 treated in the previous section is of very great intrinsic interest, its chief use in
the elementary theory is as a space in which some of the most important spaces of functional analysis are
embedded. In the next few sections I introduce these one at a time.
The first is the space L1 of equivalence classes of integrable functions. The importance of this space is not
only that it offers a language in which to express those many theorems about integrable functions which do
not depend on the differences between two functions which are equal almost everywhere. It can also appear
as the natural space in which to seek solutions to a wide variety of integral equations, and as the completion
of a space of continuous functions.
142 Function spaces 242A

242A The space L1 Let (X, , ) be any measure space.

(a) Let L1 = L1 () be the set of real-valued functions, defined on subsets of X, which are integrable over
X. Then L1 L0 = L0 (), as defined in 241, and, for f L0 , we have f L1 iff there is a g L1 such
that |f | a.e. g; if f L1 , g L0 and f =a.e. g, then g L1 . (See 122P-122R.)

(b) Let L1 =R L1 ()R L0 = L0 () be the set of equivalence classes of members


R of L1 . If f , gR L1 and
R
1
f =a.e. g then f = g (122Rb). Accordingly we may define a functional on L by writing f = f
for every f L1 .
R
(c)
R R It will be convenient to be able to write A
u for u L1 , A X; this may be defined by saying that
1
A
f R= A f Rfor every f L , where the integral is defined in 214D. P P I have only to check that if f =a.e. g
then A f = A g; and thisR is because
R f A = gA almost everywhere
R Ron A. Q
Q
If E , u L1 then E u = u (E) ; this is because E f = f E for every integrable function
f (131Fa).

(d) If u L1 , there is a -measurable, -integrable function f : X R such that f = u. P


P As noted in
241Bk, there is a measurable f : X R such that f = u; but of course f is integrable because it is equal
almost everywhere to some integrable function. Q Q

R 242B Theorem Let (X, , ) be any measure space. Then L1 () is a linear subspace of L0 () and
1
: L R is a linear functional.
proof If u, v L1 = L1 () and c R let f , g be integrable functions such that u = f , v = g ; then f + g
and cf are integrable, so u + v = (f + g) and cu = (cf ) belong to L1 . Also
R R R R R R
u+v = f +g = f + g = u+ v
and
R R R R
cu = cf = c f = c u.

242C The order structure of L1 Let (X, , ) be any measure space.

(a) L1 = L1 () has an order structure derived from that of L0 = L0 () (241E); that is, f g iff f g
a.e. Being a linear subspace of L0 , L1 must be a partially ordered linear space; the two conditions of 241Ec
are obviously inherited by linear subspaces. R R
1
Note
R alsoRthat if u, v L and u v then u v, because if f , g are integrable functions and f a.e. g
then f g (122Od).

(b) If u L0 , v L1 and |u| |v| then u L1 . P P Let f L0 = L0 (), g L1 = L1 () be such that


u = f , v = g ; then g is integrable and |f | a.e. |g|, so f is integrable and u L1 . Q

Q

(c) In particular, |u| L1 whenever u L1 , and


R R R R
| u| = max( u, (u)) |u|,
because u, u |u|.

(d) Because |u| L1 for every u L1 ,


1
u v = (u + v + |u v|), u v = 12 (u + v |u v|)
2

belong to L1 for all u, v L1 . But if w L1 we surely have


w u & w v w u v,

w u & w v w u v
because these are true for all w L0 , so u v = sup{u, v}, u v = inf{u, v} in L1 . Thus L1 is, in itself, a
Riesz space.
242E L1 143
R
(e) Note that if uR L1 , then u 0 iff E u 0 for every E ; this is because if f is an integrable
1
function
R R on X and E f 0 for every E , then f a.e. 0 (131Fb). More1 generally,
R if u,
R v L and
E
u E v for every E , then u v. It follows at once that if u, v L and E u = E v for every
E , then u = v (cf. 131Fc).

(f ) If u 0 in L1 , there is a non-negative f L1 such that f = u (compare 241Eg).

242D The norm of L1 Let (X, , ) be any measure space.


R R
(a) For f L1 = L1 () I write kf k1 = |f | [0, [. For u L1 = L1 () set kuk1 = |u|, so that
kf k1 = kf k1 for every f L1 . Then k k1 is a norm on L1 . P
P (i) If u, v L1 then |u + v| |u| + |v|, by
241Ee, so
R R R R
ku + vk1 = |u + v| |u| + |v| = |u| + |v| = kuk1 + kvk1 .
(ii) If u L1 and c R then
R R R
kcuk1 = |cu| =
|c||u| = |c| |u| = |c|kuk1 .
R R
(iii) If u L1 and kuk1 = 0, express u as f , where f L1 ; then |f | = |u| = 0. Because |f | is
non-negative, it must be zero almost everywhere (122Rc), so f =a.e. 0 and u = 0 in L1 . Q
Q
R R
(b) Thus L1 , with k k1 , is a normed space and : L1 R is a linear operator; observe that k k 1,
because
R R
| u| |u| = kuk1
for every u L1 .

(c) If u, v L1 and |u| |v|, then


R R
kuk1 = |u| |v| = kvk1 .
1
In particular, kuk1 = k|u|k1 for every u L .

(d) Note the following property of the normed Riesz space L1 : if u, v L1 and u, v 0, then
R R R
ku + vk1 = u + v = u + v = kuk1 + kvk1 .

(e) The set (L1 )+ = {u : u 0} is closed in L1 . P


P If v L1 , u (L1 )+ then ku vk1 kv 0k1 ; this is
1
because if f , g L and f a.e. 0, |f (x) g(x)| | min(g(x), 0)| whenever f (x) and g(x) are both defined
and f (x) 0, which is almost everywhere, so
R R
ku vk1 = |f g| |g 0| = kv 0k1 .
Now this means that if v L1 and v 6 0, the ball {w : kw vk1 < } does not meet (L1 )+ , where
= kv 0k1 > 0 because v 0 6= 0. Thus L1 \ (L1 )+ is open and (L1 )+ is closed. Q
Q

242E For the next result we need a variant of B.Levis theorem.


Lemma
PLetR (X, , ) be a measurePspace and hfn inN a sequence of -integrable real-valued functions such

that n=0 |fn | < . Then f = n=0 fn is integrable and
R P R R P R
f = n=0 fn , |f | n=0 |fn |.
Pn
proof (a) Suppose first that every fn is non-negative. Set gn = k=0 fk for each n; then hgn inN is
increasing a.e. and
R P R
limn gn = k=0 fk
is finite, so by B.Levis theorem (123A) f = limn gn is integrable and
R R P R
f = limn gn = k=0 fk .
144 Function spaces 242E

In this case, of course,


R R P R P R
|f | = f= n=0 fn = n=0 |fn |.

1 1
(b) For the general case, set fn+ = 2 (|fn | + fn ), fn = 2 (|fn | fn ), as in 241Ef; then fn+ and fn are
non-negative integrable functions, and
P R P R P R
n=0 fn+ + n=0 fn = n=0 |fn | < .
P P
So h1 = n=0 fn+ and h2 = n=0 fn are both integrable. Now f =a.e. h1 h2 , so
R R R P R P R P R
f = h1 h2 = n=0 fn+ n=0 fn = n=0 fn .
Finally
R R R P R P R P R
|f | |h1 | + |h2 | = n=0 fn+ + n=0 fn = n=0 |fn |.

242F Theorem For any measure space (X, , ), L1 () is complete under its norm k k1 .
proof Let hun inN be a sequence in L1 such that kun+1 un k1 4n for every n N. Choose integrable
functions fn such that f0 = u0 , fn+1

= un+1 un for each n N. Then
P R P
n=0 |fn | = ku0 k1 + n=0 kun+1 un k1 < .
P Pn
So f = n=0 fn is integrable, by 242E, and u = f L1 . Set gn = j=0 fj for each n; then gn = un , so
R R P P j
ku un k1 = |f gn | j=n+1 |fj | j=n+1 4 = 4n /3
for each n. Thus u = limn un in L1 . As hun inN is arbitrary, L1 is complete (2A4E).

242G Definition It will be convenient, for later reference, to introduce the following phrase. A Banach
lattice is a Riesz space U together with a norm k k on U such that (i) kuk kvk whenever u, v U and
|u| |v|, writing |u| for u (u), as in 241Ee (ii) U is complete under k k. Thus 242Dc and 242F amount
to saying that the normed Riesz space (L1 , k k1 ) is a Banach lattice.

242H L1 as a Riesz space We can discuss the ordered linear space L1 in the language already used
in 241E-241G for L0 .
Theorem Let (X, , ) be any measure space. Then L1 = L1 () is Dedekind complete.
proof (a) Let A L1 be any non-empty set which is bounded above in L1 . Set
A0 = {u0 . . . un : u0 , . . . , un A}.
Then A A0 , A0 has the same upper R bounds
R as A and u v A0 for all u, vR A0 . Taking w0 to be any
upper bound of A and A , we have R u w0 for every u A0 , so = supuA0 u is defined in R. For each
0

n N, choose un A0 such that un 2n . Because L0 = L0 () is Dedekind -complete (241Ga),


u = supnN un is defined in L0 , and u0 u w0 in L0 . Consequently
0 u u0 w0 u0
in L0 . But w0 u0 L1 , so u u0 L1 (242Cb) and u L1 .
(b) The point is that u is an upper bound for A. P
P If u A, then u un A0 for every n, so

Z Z
ku u u k1 = u u u u u un

(because u un un u , so u un u u )
Z
= u un un

(because u un + u un = u + un see the formulae in 242Cd)


242Jc L1 145
Z Z
= u un un ( 2n ) = 2n

for every n; so ku u u k1 = 0. But this means that u = u u , that is, that u u . As u is arbitrary,
u is an upper bound for A. Q Q
(c) On the other hand, any upper bound for A is surely an upper bound for {un : n N}, so is greater
than or equal to u . Thus u = sup A in L1 . As A is arbitrary, L1 is Dedekind complete.
Remark Note that the order-completeness of L1 , unlike that of L0 , does not depend on any particular
property of the measure space (X, , ).

242I The Radon-Nikodym theorem I think it is worth re-writing the Radon-Nikodym theorem
(232E) in the language of this chapter.
Theorem Let (X, , ) be a measure space. Then there is a canonical bijection between L1 = L1 () and
the set of truly continuous additive functionals : R, given by the formula
R
F = F u for F , u L1 .

Remark Recall that if is -finite, then the truly continuous additive functionals are just the absolutely
continuous countably additive functionals; and that if is totally finite, then all absolutely continuous
(finitely) additive functionals are truly continuous (232B).
R
proof For u L1 , F set u F = F u. If u L1 , there is an integrable function f such that f = u, in
which case
R
F 7 u F = F f : R
is additive and truly continuous, by 232D. If R : R is additive and truly continuous, then by 232E there
1
is an integrable function f such that F = F f for every F R; setting R u = f in L , = u . Finally, if

1
u, v are distinct members of L , there is an F such that F u 6= F v (242Ce), so that u 6= v ; thus
u 7 u is injective as well as surjective.

242J Conditional expectations revisited We now have the machinery necessary for a new interpre-
tation of some of the ideas of 233.
(a) Let (X, , ) be a measure space, and T a -subalgebra of , as in 233A. Then (X, T, T) is a
measure space, and L0 ( T) L0 (); moreover, if f , g L0 ( T), then f = g ( T)-a.e. iff f = g -a.e.
P
P There are T-conegligible sets F , G T such that f F and gG are T-measurable; set
E = {x : x F G, f (x) 6= g(x)} T;
then
f = g ( T)-a.e. ( T)(E) = 0 E = 0 f = g -a.e. Q
Q
Accordingly we have a canonical map S : L0 ( T) L0 () defined by saying that if u L0 ( T) is the
equivalence class of f L0 ( T), then Su is the equivalence class of f in L0 (). It is easy to check, working
through the operations described in 241D, 241E and 241H, that S is linear, injective and order-preserving,
and that |Su| = S|u|, S(u v) = Su Sv, S(u v) = Su Sv for u, v L0 ( T).
R R
(b) Next, if f L1 ( T), then f L1 () and f d = f d( T) (233B); so Su L1 () and kSuk1 =
kuk1 for every u L1 ( T).
Observe also that every member of L1 () S[L0 ( T)] is actually in S[L1 ( T)]. P P Take u L1 ()
0 1 0
S[L ( T)]. Then u is expressible both as f where f L (), and as g where g L ( T). So g =a.e. f ,

and g is -integrable, therefore ( T)-integrable. Q


Q
This means that S : L1 ( T) L1 () S[L0 ( T)] is a bijection.
(c) Now suppose that X = 1, so that (X, , ) is a probabilityR space.R Recall that g is a conditional
expectation of f on T if g is T-integrable, f is -integrable and F g = F f for every F T; and that
every -integrable function has such a conditional expectation (233D).R If g is Ra conditional expectation of
f and f1 = f -a.e. then g is a conditional expectation of f1 , because F f1 = F f for every F ; and I have
already remarked in 233Dc that if g, g1 are conditional expectations of f on T then g = g1 T-a.e.
146 Function spaces 242Jd

(d) This means that we have an operator P : L1 () L1 ( T) defined by saying R that


R P (f ) = g

1 1
whenever g L ( T) is a conditional expectation of f L () on T; that is, that F P u = F u whenever
u L1 (), F T. If we identify L1 (), L1 ( T) with the sets of absolutely continuous additive functionals
defined on and T, as in 242I, then P corresponds to the operation 7 T.
R R
(e) Because P u is uniquely defined in L1 ( T) by the requirement F P u = F u for every F T (242Ce),
we see that P must be linear. P P If u, v L1 () and c R, then
R R R R R R R
F
P u + P v = F P u + F P v = F u + F v = F u + v = F P (u + v),
R R R R R
F
P (cu) = F cu = c F u = c F P u = F cP u
R R
for every F T. Q Q Also, if u 0, then F P u = F u 0 for every F T, so P u 0 (242Ce again).
It follows at once that P is order-preserving, that is, that P u P v whenever u v. Consequently
|P u| = P u (P u) = P u P (u) P |u|
0
for every u L ( T), because u |u| and u |u|.

(f ) We may legitimately regard P u L1 ( T) as the conditional expectation of u L1 () on T; P is


the conditional expectation operator.
1 1
R
R (g) If Ru L ( T), then we have a corresponding Su L (), as in (b); now P Su = u. P P F P Su =
F
Su = F u for every F T. Q Q Consequently SP SP = SP : L1 () L1 ().

(h) The distinction drawn above between u = f L0 ( T) and Su = f L0 () is of course pedantic.


I believe it is necessary to be aware of such distinctions, even though for nearly all purposes it is safe as well
as convenient to regard L0 ( T) as actually a subset of L0 (). If we do so, then (b) tells us that we can
identify L1 ( T) with L1 () L0 ( T), while (g) becomes P 2 = P .

242K The language just introduced allows the following re-formulations of 233J-233K.
Theorem Let (X, , ) be a probability space and T a -subalgebra of . Let : R R be a convex
function and : L0 () L0 () the corresponding operator defined by setting (f ) = (f ) (241I). If
P : L1 () L1 ( T) is the conditional expectation operator, then (P u) P (u) whenever u L1 () is
such that (u) L1 ().
proof This is just a restatement of 233J.

242L Proposition Let (X, , ) be a probability space, and T a -subalgebra of . Let P : L1 ()


L ( T) be the corresponding conditional expectation operator. If u L1 = L1R() and v R L0 ( T), then
1

u v L1 iff P |u| v L1 , and in this case P (u v) = P u v; in particular, u v = P u v.


proof (I am here using the identification of L0 ( T) as a subspace of L0 (), as suggested in 242Jh.) Express
u as f , v as h , where f L1 = L1 () and h L0 ( T). Let g, g0 L1 ( T) be conditional expectations
of f , |f | respectively, so that P u = g and P |u| = g0 . Then, using 233K,
u v L1 f h L1 g0 h L1 P |u| v L1 ,
and in this case g h is a conditional expectation of f h, that is, P u v = P (u v).

242M L1 as a completion I mentioned in the introduction to this section that L1 appears in functional
analysis as a completion of some important spaces; put another way, some dense subspaces of L1 are
significant. The first is elementary.
Proposition Let (X, , ) be any measure space, and write S for the space of -simple functions on X.
Then R
(a) whenever f is a -integrable real-valued function and > 0, there is an h S such that |f h| ;
(b) S = {f : f S} is a dense linear subspace of L1 = L1 ().
242O L1 147
R R
proof (a)(i) If f is non-negative, then there is a simple function h such that h a.e. f and h f 12
(122K), in which case
R R R R 1
|f h| = f h = f h .
2
(ii) In the general case, f is expressible
R as a difference f1 f2 of non-negative integrable functions. Now
there are h1 , h2 S such that |fj hj | 21 for both j and
R R R
|f h| |f1 h1 | + |f2 h2 | .

(b) Because S is a linear subspace of RX included in L1 = L1 (), S isRa linear subspace of L1 . If u L1


and > 0, there are an f L1 such that f = u and an h S such that |f h| ; now v = h S and
R
ku vk1 = |f h| .
As u and are arbitrary, S is dense in L1 .

242N As always, Lebesgue measure on R r and its subsets is by far the most important example; and
in this case we have further classes of dense subspace of L1 . If you have reached this point without yet
troubling to master multi-dimensional Lebesgue measure, just take r = 1. If you feel uncomfortable with
general subspace measures, take X to be R r or [0, 1] R or some other particular subset which you find
interesting. The following term will be useful.
Definition If f is a real- or complex-valued function defined on a subset of R r , say that the support of f
is {x : x dom f, f (x) 6= 0}.

242O Theorem Let X be any subset of R r , where r 1, and let be Lebesgue measure on X, that
is, the subspace measure on X induced by Lebesgue measure on R r . Write Ck for the space of bounded
continuous functions f : R r R which have bounded support, and S0 for the space of linear combinations
of functions of the form I where I R r is a bounded half-open interval. Then R R
(a) whenever f L1 = L1 () and > 0, there are g Ck , h S0 such that X |f g| , X |f h| ;
(b) {(gX) : g Ck } and {(hX) : h S0 } are dense linear subspaces of L1 = L1 ().
Remark Of course there is a redundant bounded in the description of Ck ; see 242Xh.
proof (a) I argue in turn that the result is valid for each of an increasing number of members f of L1 = L1 ().
Write r for Lebesgue measure on R r , so that is the subspace measure (r )X .
(i) Suppose first that f = IX where I R r is a bounded half-open interval. Of course I is already
in S0 , so I have only to show that it is approximated by members of Ck . If I = the result is trivial; we
can take g = 0. Otherwise, express I as [a b, a + b[ where a = (1 , . . . , r ), b = (1 , . . . , r ) and j > 0
for each j. Let > 0 be such that
Q Q
jr (j + 2) + jr j .

For R set
gj () = 1 if | j | j ,
= (j + | j |)/ if j | j | j + ,
= 0 if | j | j + .

j
j

The function gj
148 Function spaces 242O

For x = (1 , . . . , r ) R r set
Q
g(x) = jr gj (j ).
Then g Ck and I g J, where J = [a b 1, a + b + 1] (writing 1 = (1, . . . , 1)), so that (by the
choice of ) r J r I + , and
Z Z
|g f | ((J X) (I X))d = ((J \ I) X)
X
r (J \ I) = r J r I ,
as required.
(ii) Now suppose that f = (X E) where E R r is S a set of finite measure. Then there is a disjoint
family I0 , . . . , In of half-open intervals such that r (E4 jn Ij ) 21 . P P There is an open set G E
such that r (G \ E) 14 (134Fa). For each m N, let Im be the family of half-open intervals in R r of
m m m
the form [a, b[ where a = (2 S k1 , . . . , 2 kr ), k1 , . . . , kr being integers, and b = a + 2 1; then Im is a
disjoint family. Set Hm = {I : I Im , I G}; then hHm imN is a non-decreasing family with union G,
so that there isSan m such that r (G \ Hm ) 14 and r (E4Hm ) 21 . But now Hm is expressible as a
disjoint union jn Ij where I0 , . . . , In enumerate the members of Im included in Hm . (The last sentence
derails if Hm is empty. Pn But if Hm = then we can take n = 0, I0 = .) Q Q
Accordingly h = j=0 Ij S0 and
R S 1
X
|f h| = (X (E4 . jn Ij ))
2
R
As for Ck , (i) tells us that there is for each j n a gj Ck such that X |gj Ij | /2(n + 1), so that
Pn
g = j=0 gj Ck and
R R R Pn R
X
|f g| X
|f h| + X
|h g| + j=0 X |gj Ij | .
2
Pn
(iii) If f is a simple function, express f as k=0 ak Ek where each Ek is of finite measure in X. Each
Ek is expressible as X Fk where r Fk = Ek (214Ca). By (ii), we can find gk Ck , hk S0 such that
R R
|ak | X |gk Fk | , |ak | X |hk Fk |
n+1 n+1
Pn Pn
for each k. Set g = k=0 ak gk , h = k=0 ak hk ; then g Ck , h S0 and
R R Pn Pn R
X
|f g| X k=0 |ak ||Fk gk | = k=0 |ak | X |Fk gk | ,
R Pn R
X
|f h| k=0 |ak | X |Fk hk | ,
as required.

R (iv) If f is any integrable function on X, then by 242Ma we can R find a simple function
R f0 such that
|f f0 | 12 , and now by (iii) there are g Ck , h S0 such that X |f0 g| 21 , X |f0 h| 21 ; so
that
R R R
X
|f g| X |f f0 | + X |f0 g| ,
R R R
X
|f h| X |f f0 | + X |f0 h| .

(b)(i) We must check first that if g Ck then gX is actually -integrable. The point here is that if
g Ck and a R then
{x : x X, g(x) > a}
is the intersection of X with an open subset of R r , and is therefore in the domain of , because all open sets
are Lebesgue measurable (115G). Next, g is bounded and the set E = {x : x X, g(x) 6= 0} is bounded in
R r , therefore of finite outer measure for r and of finite measure for . Thus there is an M 0 such that
|g| M E, which is -integrable. Accordingly g is -integrable.
242Xb L1 149

Of course hX is -integrable for every h S0 because (by the definition of subspace measure) (I X)
is defined and finite for every bounded half-open interval I.
(ii) Now the rest follows by just the same arguments as in 242Mb. Because {gX : g Ck } and
{hX : h S0 } are linear subspaces of RX included in L1 (), their images Ck# and S0# are linearRsubspaces
of L1 . If u L1 and > 0, there are an f L1 such that f = u, and g Ck , h S0 such that X |f g|,
R
X
|f h| ; now v = (gX) Ck# and w = (hX) S0# and
R R
ku vk1 = X |f g| , ku wk1 = X |f h| .
As u and are arbitrary, Ck# and S0# are dense in L1 .

242P Complex L1 As you would, I hope, expect, we can repeat the work above with L1C , the space of
complex-valued integrable functions, in place of L1 , to construct a complex Banach space L1C . The required
changes, based on the ideas of 241J, are minor.

(a) In 242Aa, it is perhaps helpful to remark that, for f L0C ,


f L1C |f | L1 Re(f ), Im(f ) L1 .
Consequently, for u L0C ,
u L1C |u| L1 Re(u), Im(u) L1 .

1
P R
(b) ToP
prove Ra complex version
Pof 242E,
R observe that if hf n i nN is a sequence in LC such that n=0 |fn | <

, then n=0 | Re(fn )| and n=0 | Im(fn )| are both finite, so we may apply 242E twice and see that
R P R P R P P R
( n=0 fn ) = ( n=0 Re(fn )) + ( n=0 Im(fn )) = n=0 fn .
Accordingly we can prove that L1C is complete under k k1 by the argument of 242F.

(c) Similarly, little change is needed to adapt 242J to give a description of a conditional expectation
operator P : L1C () L1C ( T) when (X, , ) is a probability space and T is a -subalgebra of . In the
formula
|P u| P |u|
of 242Je, we need to know that
|P u| = sup||=1 Re(P u)
in L0 ( T) (241Jc), while
Re(P u) = Re(P (u)) = P (Re(u)) P |u|
whenever || = 1.

P(d)
n
In 242M, we need to replace S by SC , the space of complex-valued simple functions of the form
k=0 k Ek where each ak is a complex number and each Ek is a measurable set of finite measure; then
a
we get a dense linear subspace SC = {f : f SC } of L1C . In 242O, we must replace Ck by Ck (R r ; C), the
space of bounded continuous complex-valued functions of bounded support, and S0 by the linear span over
C of {(I X) : I is a bounded half-open interval}.

242X Basic exercises >(a) Let X be a set, and let be counting measure on X. Show that L1 ()
can be identified with the space `1 (X) of absolutely summable real-valued functions on X (see 226A). In
particular, the space `1 = `1 (N) of absolutely summable real-valued sequences is an L1 space. Write out
proofs of 242F adapted to these special cases.

>(b) Let (X, , ) be any measure space, and the completion of (212C, 241Xb). Show that L1 () =
L () and L1 () = L1 ().
1
150 Function spaces 242Xc

(c) Show that any Banach lattice must be an Archimedean Riesz space (241Fa).

(d) Let h(Xi , i , i )iiI be anyQfamily of measure spaces, and (X, , ) their direct sum. Show that the
isomorphism between L0 () and iI L0 (i ) (241Xg) induces an identification between L1 () and
Q P Q
{u : u iI L1 (i ), kuk = iI ku(i)k1 < } iI L1 (i ).

(e) Let (X, , ) be a probability space, and T a -subalgebra of , a -subalgebra of T. Let P1 :


L1 () L1 ( T), P2 : L1 ( T) L1 ( ) and P : L1 () L1 ( ) be the corresponding conditional
expectation operators. Show that P = P2 P1 .

(f ) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function.


Show that g 7 g : L1 () L1 () (235I) induces a linear operator T : L1 () L1 () such that
kT vk1 = kvk1 for every v L1 ().

(g) Let U be a Riesz space (definition: 241E). A Riesz norm on U is a norm k k such that kuk kvk
whenever |u| |v|. Show that if U is given its norm topology (2A4Bb) for such a norm, then (i) u 7 |u| :
U U , (u, v) 7 u v : U U U are continuous (ii) {u : u 0} is closed.

(h) Show that if g : R r R is continuous and has bounded support it is bounded. (Hint: 2A2F-2A2G.)
1
(i) Let be Lebesgue measure on R. (i) Show that if (x) = exp( )
for |x| < , 0 for |x| then
2 x2
Rx
is smooth, that is, differentiable arbitrarily often. (ii) Show that if F (x) = d for x R then F
is smooth. (iii) Show that if a < b < c < d in R there is a smooth function h such that [b, c] h [a, d].
(iv) Write D for the space of smooth functions h : R R such that {x : h(x) 6= 0} is bounded. Show that
{h : h D} is dense in L1 (). (v) Let f be a real-valued function whichR is integrable over every bounded
subset of R. Show that f h is integrable for every h D, and that if f h = 0 for every h D then
f =a.e. 0. (Hint: 222D.)

242Y Further exercises (a) Let (X, , ) be a measure space. Let A L1 = L1 () be a non-
empty downwards-directed set, and suppose that inf A = 0 in L1 . (i) Show that inf uA kuk1 = 0. (Hint: set
= inf uA kuk1 ; find a non-increasing sequence hun inN in A such that limn kun k1 = ; set v = inf nN un
and show that u v = v for every u A, so that v = 0.) (ii) Show that if U is any open set containing 0,
there is a u A such that v U whenever 0 v u.

(b) Let (X, , ) be a measure space, and A L1 = L1 () a non-empty upwards-directed set. Suppose
that A is bounded
R for the norm
R k k1 . (i) Show that there is a non-decreasing sequence hun inN in A such
that limn un = supuA u, and that hun inN is Cauchy. (ii) Show that w = sup A is defined in L1 and
belongs to the norm-closure of A in L1 , so that, in particular, kwk1 supuA kuk1 .

(c) The norm on a Banach lattice U is order-continuous if inf uA kuk = 0 whenever A U is a


non-empty downwards-directed set with infimum 0. (Thus 242Ya tells us that the norms k k1 are all order-
continuous.) Show that in this case (i) any non-decreasing sequence in U which has an upper bound in U must
be Cauchy (ii) U is Dedekind complete. (Hint for (i): if hun inN is a non-decreasing sequence with an upper
bound in U , let B be the set of upper bounds of {un : n N} and show that A = {v un : v B, n N}
has infimum 0 because U is Archimedean.)

(d) Let (X, , ) be any measure space. Show that L1 () has the countable sup property (241Yd).

(e) More generally, show that any Banach lattice with an order-continuous norm has the countable sup
property.

(f ) Let (X, , ) be a measure space and Y any subset of X; let Y be the subspace measure on Y and
T : L0 () L0 (Y ) the canonical map described in 241Ye. (i) Show thatR T u L 1
R (Y ) and kT uk1 kuk1
1 1
for every u L (). (ii) Show that if u L () then kT uk1 = kuk1 iff E u = Y E T u for every E .
(iii) Show that T is surjective and that kvk1 = min{kuk1 : u L1 (), T u = v} for every v L1 (Y ). (Hint:
214Eb.) (See also 244Yc below.)
242 Notes L1 151

(g) Let (X, , ) be a measure space. Write L1strict for the space of all integrable -measurable functions
from X to R, and N for the subspace of L1strict consisting of measurable functions which are zero almost
everywhere. (i) Show that L1strict is a Dedekind -complete Riesz space. (ii) Show that L1 () can be
identified, as ordered linear space, with the quotient L1strict /N as defined in 241Yg. (iii) Show that k k1 is
a seminorm on L1strict (definition: 2A5D). (iv) Show that f 7 |f | : L1strict L1strict is continuous if L1strict
is given the topology defined from k k1 (2A5B). (v) Show that {f : f a.e. 0} is closed in L1strict , but that
{f : f 0} need not be.

(h) Let (X, , ) be a measure space, and the c.l.d. version of (213E). Show that the inclusion
L1 () L1 () induces an isomorphism, as ordered normed linear spaces, between L1 () and L1 ().

(i) Let (X, , ) and (Y, T, ) be measure spaces and U L0 () a linear subspace. Let T : U L0 ()
be a linear operator such that T u 0 in L0 () whenever u U and u 0 in L0 (). Suppose that w U
is such that w 0 and T w = (Y ) . Show that whenever : R R is a convex function and u L0 () is
such that w u and w (u) U , defining : L0 () L0 () as in 241I, then T (w u) T (w u).
Explain how this result may be regarded as a common generalization of Jensens inequality, as stated in
233I, and 242K above. See also 244M below.

(j) (i) A function : C R is convex if (ab + (1 a)c) a(b) + (1 a)(c) for all b, c C and
a [0, 1]. (ii) Show that such a function must be bounded on any bounded subset of C. (iii) If : C R is
convex and c C, show that there is a b C such that (x) (c) + Re(b(x c)) for every x C. (iv) If
hbc icC is such that (x) c (x) = (c) + Re(bc (x c)) for all x, c C, show that {bc : c I} is bounded
for any bounded I C. (v) Show that if D C is any dense set, (x) = supcD c (x) for every x C.

(k) Let (X, , ) be a probability space and T a -subalgebra of . Let P : L1C () L1C ( T) be
the conditional expectation operator. Show that if : C R is any convex function, and we define
(f ) = (f ) for every f L0C (), then (P u) P ((u)) whenever u L1C () is such that (u) L1 ().

P(l) Let (X, , ) be a measure


Pn Pspace and u0 , . . . , un L1 (). (i) Suppose
Pn Pk 0 , . . . , kn Z are such that
n n n
i=0 i k = 1. Show that i=0 j=0 i kj kui uj k1 0. P
k (Hint: i=0 j=0 ki kjP |i j | 0 for all
n n Pn
0 , . . . , n R.) (ii) Suppose 0 , . . . , n R are such that i=0 i = 0. Show that i=0 j=0 i j kui
uj k1 0.

242 Notes and comments Of course L1 -spaces compose one of the most important classes of Riesz space,
and accordingly their properties have great prominence in the general theory; 242Xc, 242Xg and 242Ya-
242Ye outline some of the interrelations between these properties. I will return to these questions in Chapter
35 in the next volume. I have mentioned in passing (242Dd) the additivity of the norm of L1 on the positive
elements. This elementary fact actually characterizes L1 spaces among Banach lattices (Kakutani 41); see
369E in the next volume.
Just as L0 () can be regarded as a quotient of a linear space L0strict , so can L1 () be regarded as a
quotient of a linear space L1strict (242Yg). I have discussed this question in the notes to 241; all I try to do
here is to be consistent.
We now have a language in which we can speak of the conditional expectation of a function f , the
equivalence class in L1 ( T) consisting precisely of all the conditional expections of f on T. If we think
of L1 ( T) as identified with its image in L1 (), then the conditional expectation operator P : L1 ()
L1 ( T) becomes a projection (242Jh). We therefore have re-statements of 233J-233K, as in 242K, 242L
and 242Yi.
I give 242O in a fairly general form; but its importance already appears if we take X to be [0, 1] with one-
dimensional Lebesgue measure. In this case, we have a natural norm on C([0, 1]), the space of all continuous
real-valued functions on [0, 1], given by setting
R1
kf k1 = 0 |f (x)|dx
for every f C([0, 1]). The integral here can, of course, be taken to be the Riemann integral; we do not
need the Lebesgue theory to show that k k1 is a norm on C([0, 1]). It is easy to check that C([0, 1]) is not
complete for this norm (if we set fn (x) = min(1, 2n xn ) for x [0, 1], then hfn inN is a k k1 -Cauchy sequence
152 Function spaces 242 Notes

with no k k1 -limit in C([0, 1])). We can use the abstract theory of normed spaces to construct a completion
of C([0, 1]); but it is much more satisfactory if this completion can be given a relatively concrete form, and
this is what the identification of L1 with the completion of C([0, 1]) can do. (Note that the remark that
k k1 is a norm on C([0, 1]), that is, that kf k1 6= 0 for every non-zero f C([0, 1]), means just that the map
f 7 f : C([0, 1]) L1 is injective, so that C([0, 1]) can be identified, as ordered normed space, with its
image in L1 .) It would be even better if we could find a realization of the completion of C([0, 1]) as a space
of functions on some set Z, rather than as a space of equivalence classes of functions on [0, 1]. Unfortunately
this is not practical; such realizations do exist, but necessarily involve either a thoroughly unfamiliar base
set Z, or an intolerably arbitrary embedding map from C([0, 1]) into R Z .
You can get an idea of the obstacle to realizing the completion of C([0, 1]) asPa space of functions on

[0,
P 1] itself by considering fn (x) = n1 xn for n 1. An easy calculation shows that n=1 kfn k1 < , so that
n=1 fn must exist in the completion of C([0, 1]); but there is no natural value to assign to it at the point
1. Adaptations of this idea can give rise to indefinitely complicated phenomena indeed, 242O shows that
every integrable function is associated with some appropriate sequence from C([0, 1]). In 245 I shall have
more to say about what k k1 -convergent sequences look like.
From the point of view of measure theory, narrowly conceived, most of the interesting ideas appear most
clearly with real functions and real linearspaces. But some of the most important applications of measure
theory important not only as mathematics in general, but also for the measure-theoretic questions they
inspire deal with complex functions and complex linear spaces. I therefore continue to offer sketches of
the complex theory, as in 242P. I note that at irregular intervals we need ideas not already spelt out in the
real theory, as in 242Pb and 242Yk.

243 L
The second of the classical Banach spaces of measure theory which I treat is the space L . As will appear
below, L is the polar companion of L1 , the linked opposite; for ordinary measure spaces it is actually the
dual of L1 (243F-243G).

243A Definitions Let (X, , ) be any measure space. Let L = L () be the set of functions
f L0 = L0 () which are essentially bounded, that is, such that there is some M 0 such that
{x : x dom f, |f (x)| M } is conegligible, and write
L = L () = {f : f L ()} L0 ().
Note that if f L , g L0 and g = f a.e., then g L ; thus L = {f : f L0 , f L }.

243B Theorem Let (X, , ) be any measure space. Then


(a) L = L () is a linear subspace of L0 = L0 ().
(b) If u L , v L0 and |v| |u| then v L . Consequently |u|, u v, u v, u+ = u 0 and
u = (u) 0 belong to L for all u, v L .

(c) Writing e = 1 , the equivalence class in L0 of the constant function with value 1, then an element u
of L0 belongs to L iff there is an M 0 such that |u| M e.
(d) If u, v L then u v L .
(e) If u L , v L1 = L1 () then u v L1 .
proof (a) If f , g L = L () and c R, then f + g, cf L . P
P We have M1 , M2 0 such that
|f | M1 a.e. and |g| M2 a.e. Now
|f + g| |f | + |g| M1 + M2 a.e., |cf | |c||M1 | a.e.,

so f + g, cf L . Q
Q It follows at once that u + v, cu L whenever u, v L and c R.
(b)(i) Take f L , g L0 = L0 () such that u = f , v = g . Then |g| |f | a.e. Let M 0 be such
that |f | M a.e.; then |g| M a.e., so g L and v L .
243Dd L 153

(ii) Now | |u| | = |u| so |u| L whenever u L . Also uv = 12 (u+v+|uv|), uv = 12 (u+v|uv|)


belong to L for all u, v L .
(c)(i) If u L , take f L such that f = u. Then there is an M 0 such that |f | M a.e., so that
|f | M 1 a.e. and |u| M e. (ii) Of course 1 L , so e L , and if u L0 and |u| M e then u L
by (b).
(d) f g L whenever f , g L . P
P If |f | M1 a.e. and |g| M2 a.e., then
|f g| = |f | |g| M1 M2 a.e. Q
Q

So u v L for all u, v L .
(e) If f L and g L1 = L1 (), then there is an M 0 such that |f | M a.e., so |f g| M |g|
almost everywhere; because M |g| is integrable and f g is virtually measurable, f g is integrable and
u v L1 .

243C The order structure of L Let (X, , ) be any measure space. Then L = L (), being
a linear subspace of L0 = L0 (), inherits a partial order which renders it a partially ordered linear space
(compare 242Ca). Because |u| L whenever u L (243Bb), u v and u v belong to L whenever u,
v L , and L is a Riesz space (compare 242Cd).
The behaviour of L as a Riesz space is dominated by the fact that it has an order unit e = 1 with
the property that
for every u L there is an M 0 such that |u| M e
(243Bc).

243D The norm of L Let (X, , ) be any measure space.

(a) For f L = L (), say that the essential supremum of |f | is


ess sup |f | = inf{M : M 0, {x : x dom f, |f (x)| M } is conegligible}.
P Set M = ess sup |f |. For each n N, there is an Mn M + 2n such that
Then |f | ess sup |f | a.e. P
|f | Mn a.e. Now
T
{x : |f (x)| M } = nN {x : |f (x)| Mn }
is conegligible, so |f | M a.e. Q
Q

(b) If f , g L and f = g a.e., then ess sup |f | = ess sup |g|. Accordingly we may define a functional
k k on L = L () by setting kuk = ess sup |f | whenever u = f .

(c) From (a), we see that, for any u L , kuk = min{ : |u| e}, where, as before, e = 1 L .
Consequently k k is a norm on L . P P(i) If u, v L then
|u + v| |u| + |v| (kuk + kvk )e
so ku + vk kuk + kvk . (ii) If u L and c R then
|cu| = |c||u| |c|kuk e,
so kcuk |c|kuk . (iii) If kuk = 0, there is an f L such that f = u and |f | kuk a.e.; now
f = 0 a.e. so u = 0. Q
Q

(d) Note also that if u L0 , v L and |u| |v| then |u| kvk e so u L and kuk kvk ;
similarly,
ku vk kuk kvk , ku vk max(kuk , kvk )
for all u, v L . Thus L is a commutative Banach algebra (2A4J).
154 Function spaces 243De

(e) Moreover,
R R
| u v| |u v| = ku vk1 kuk1 kvk
whenever u L1 and v L , because
|u v| = |u| |v| |u| kvk e = kvk |u|.

(f ) Observe that if u, v are non-negative members of L then


ku vk = max(kuk , kvk );
this is because, for any 0,
u v e u e and v e.

243E Theorem For any measure space (X, , ), L = L () is a Banach lattice under k k .
proof (a) We already know that kuk kvk whenever |u| |v| (243Dd); so we have just to check
that L is complete under k k . Let hun inN be a Cauchy sequence in L . For each n N choose
fn L = L () such that fn = un in L . For all m, n N, (fm fn ) = um un . Consequently
Emn = {x : |fm (x) fn (x)| > kum un k }
is negligible, by 243Da. This means that
T S
E = nN {x : x dom fn , |fn (x)| kun k } \ m,nN Emn
is conegligible. But for every x E, |fm (x) fn (x)| kum un k for all m, n N, so that hfn (x)inN is
a Cauchy sequence, with a limit in R. Thus f = limn fn is defined almost everywhere. Also, at least for
x E,
|f (x)| supnN kun k < ,

so f L and u = f L . If m N, then, for every x E,

|f (x) fm (x)| supnm |fn (x) fm (x)| supnm kun um k ,


so
ku um k supnm kun um k 0
as m , and u = limm um in L . As hun inN is arbitrary, L is complete.

243F The duality between L and L1 Let (X, , ) be any measure space.
1 1 1
R (a) I have already remarked that if u L = L () and v L = L (), then u v L and
| u v| kuk1 kvk (243Bd, 243De).

(b) Consequently we have a bounded linear operator T from L to the normed space dual (L1 ) of L1 ,
given by writing
R
(T v)(u) = u v
for all u L1 , v L . P
P (i) By (a), (T v)(u) is well-defined for u L1 , v L . (ii) If v L , u, u1 ,
1
u2 L and c R, then
Z Z
(T v)(u1 + u2 ) = (u1 + u2 ) v = (u1 v) + (u2 v)
Z Z
= u1 v + u2 v = (T v)(u1 ) + (T v)(u2 ),

R R R
(T v)(cu) = cu v = c(u v) = c u v = c(T v)(u).
This shows that T v : L1 R is a linear functional for each v L . (iii) Next, for any u L1 and v L ,
243G L 155
R
|(T v)(u)| = | u v| ku vk1 kuk1 kvk ,
as remarked in (a). This means that T v (L1 ) and kT vk kvk for every v L . (iv) If v, v1 , v2 L ,
u L1 and c R, then
Z Z
T (v1 + v2 )(u) = (v1 + v2 ) u = (v1 u) + (v2 u)
Z Z
= v1 u + v2 u = (T v1 )(u) + (T v2 )(u)

= (T v1 + T v2 )(u),
R R
T (cv)(u) = cv u = c v u = c(T v)(u) = (cT v)(u).
As u is arbitrary, T (v1 + v2 ) = T v1 + T v2 and T (cv) = c(T v); thus T : L (L1 ) is linear. (v) Recalling
from (iii) that kT vk kvk for every v L , we see that kT k 1. Q Q

(c) Exactly the same arguments show that we have a linear operator T 0 : L1 (L ) , given by writing
R
(T 0 u)(v) = u v for all u L1 , v L ,
and that kT 0 k is also at most 1.

243G Theorem Let (X, , ) be a measure space, and T : L () (L1 ()) the canonical map
described in 243F. Then
(a) T is injective iff (X, , ) is semi-finite, and in this case is norm-preserving;
(b) T is bijective iff (X, , ) is localizable, and in this case is a normed space isomorphism.
proof (a)(i) Suppose that T is injective, and that E has E = . Then E is notR equal a.e. to 0,
so (E) 6= 0 in L , and T (E) 6= 0; let uR L1 be suchR that T (E) (u) 6= 0, that is, u (E) 6= 0.

Express u as f where

R f Ris integrable;
R then R E f 6= 0 so E |f | 6= 0. Let g be a simple function such that
Pn
0 g a.e. |f | and g > |f | E |f |; then E g 6= 0. Express g as i=0 ai Ei where Ei < for each i;
Pn
then 0 6= i=0 ai (Ei E), so there is an i n such that (E Ei ) 6= 0, and now E Ei is a measurable
subset of E of non-zero finite measure.
As E is arbitrary, this shows that (X, , ) must be semi-finite if T is injective.
(ii) Now suppose that (X, , ) is semi-finite, and that v L is non-zero. Express v as g where
g : X R is measurable; then g L . Take any a ]0, kvk [; then E = {x : |g(x)| a} has non-zero
measure. Let F E be a measurable set of non-zero finite measure, and set f (x) = |g(x)|/g(x) if x F , 0
otherwise; then f L1 and (f g)(x) a for x F , so, setting u = f L1 , we have
R R R
(T v)(u) = u v = f g aF = a |f | = akuk1 > 0.
This shows that kT vk a; as a is arbitrary, kT vk kvk . We know already from 243F that kT vk kvk ,
so kT vk = kvk for every non-zero v L ; the same is surely true for v = 0, so T is norm-preserving and
injective.
(b)(i) Using (a) and the definition of localizable, we see that under either of the conditions proposed
(X, , ) is semi-finite and T is injective and norm-preserving. I therefore have to show just that it is
surjective iff (X, , ) is localizable.
(ii) Suppose that T is surjective and that E . Let F be the family of finite unions of members of
E, counting as the union of no members of E, so that F is closed under finite unions and, for any G ,
E \ G is negligible for every E E iffR E \ G is negligible for every E F.
If u L1 , then h(u) = limEF ,E E u exists in R. P P If u is non-negative, then
R R
h(u) = sup{ E u : E F } u < .
For other u, we can express u as u1 u2 , where u1 and u2 are non-negative, and now h(u) = h(u1 ) h(u2 ).
Q
Q R
Evidently h : L1 R is linear, being a limit of the linear functionals u 7 E u, and also
156 Function spaces 243G
R R
|h(u)| supEF | E
u| |u|
for every u, so h (L1 ) . Since we are supposing that T is surjective, there is a v L such that T v = h.
Express v as g where g : X R is measurable and essentially bounded. Set G = {x : g(x) > 0} .
If F and F < , then
R R
F
g = (F ) g = (T v)(F ) = h(F ) = supEF (E F ).
?? If E E and E \ G is not negligible, then there is a set F E \ G such that 0 < F < ; now
R
F = (E F ) F g 0,
as g(x) 0 for x F . X
X Thus E \ G is negligible for every E E.
Let H be such that E \ H is negligible for every E E. ?? If G \ H is not negligible, there is a set
F G \ H of non-zero finite measure. Now
(E F ) (H F ) = 0
R
for every E E, so (E F ) = 0 for every E F, and F g = 0; but g(x) > 0 for every x F , so F = 0,
which is impossible. X
X Thus G \ H is negligible.
Accordingly G is an essential supremum of E in . As E is arbitrary, (X, , ) is localizable.
(iii) For the rest of this proof, I will suppose that (X, , ) is localizable and seek to show that T is
surjective.
Take h (L1 ) such that khk = 1. Write f = {F : F , F < }, and for F f define F : R
by setting
F E = h((E F ) )
for every E . Then F = h(0) = 0, and if hEn inN is a disjoint sequence in with union E,
P
(E F ) = n=0 (En F )
in L1 . P
P
Pn S
k(E F ) k=0 (En F ) k1 = (F E \ kn Ek ) 0
as n . Q
Q So
P P
F E = h((E F ) ) = n=0 h((En F ) ) = n=0 F En .
Thus F is countably additive. Also
|F E| k(E F ) k1 = (E F )
for every E , so F is truly continuous in the
R sense of 232Ab. By the Radon-Nikodym theorem (232E),
there is an integrable function gF such that E gF = F E for every E ; we may take it that gF is
measurable and has domain X (232He).
(iv) It is worth noting that |gF | a.e. 1. P
P If G = {x : gF (x) > 1}, then
R
g = F G (F G) G;
G F

but this is possible only if G = 0. Similarly, if G0 = {x : gF (x) < 1}, then


R
g = F G0 G0 ,
G0 F

so again G0 = 0. Q
Q
(v) If F , F 0 f , then gF = gF 0 almost everywhere on F F 0 . P
P If E and E F F 0 , then
R R
g = h((E F ) ) = h((E F 0 ) ) = E gF 0 .
E F

Q 213N (applied to {gF F : F f }) now tells us that, because is localizable,


So 131H gives the result. Q
there is a measurable function g : X R such that g = gF almost everywhere on F , for every F f .
(vi) For any F f , the set
{x : x F, |g(x)| > 1} {x : |gF (x)| > 1} {x : x F, g(x) 6= gF (x)}
243I L 157

is negligible; because is semi-finite, {x : |g(x)| > 1} is negligible, and g L , with ess sup |g| 1.
Accordingly v = g L , and we may speak of T v (L1 ) .
(vii) If F f , then
R R
F
g= F
gF = F X = h(F ).
It follows at once that
R
(T v)(f ) = f g = h(f )
for every simple function f : X R. Consequently T v = h, because both T v and h are continuous and the
equivalence classes of simple functions form a dense subset of L1 (242Mb, 2A3Uc). Thus h = T v is a value
of T .
(viii) The argument as written above has assumed that khk = 1. But of course any non-zero member
of (L1 ) is a scalar multiple of an element of norm 1, so is a value of T . So T : L (L1 ) is indeed
surjective, and is therefore an isometric isomorphism, as claimed.

243H Recall that L0 is always Dedekind -complete and sometimes Dedekind complete (241G), while
L1 is always Dedekind complete (242H). In this respect L follows L0 .
Theorem Let (X, , ) be a measure space.
(a) L () is Dedekind -complete.
(b) If is localizable, L () is Dedekind complete.
proof These are both consequences of 241G. If A L = L () is bounded above in L , fix u0 A and
an upper bound w0 of A in L . If B is the set of upper bounds for A in L0 = L0 (), then B L is the
set of upper bounds for A in L . Moreover, if B has a least member v0 , then we must have u0 v0 w0 ,
so that
0 v0 u0 w0 u0 L
and v0 u0 , v0 belong to L . (Compare part (a) of the proof of 242H.) Thus v0 = sup A in L .
Now we know that L0 is Dedekind -complete; if A L is a non-empty countable set which is bounded
above in L , it is surely bounded above in L0 , so has a supremum in L0 which is also its supremum in L .
As A is arbitrary, L is Dedekind -complete. While if is localizable, we can argue in the same way with
arbitrary non-empty subsets of L to see that L is Dedekind complete because L0 is.

243I A dense subspace of L In 242M-242O I described a couple of important dense linear subspaces
of L1 spaces. The position concerning L is a little different. However I can describe one important dense
subspace.
Proposition Let (X, , ) be a measure space.
(a) Write SP for the space of -simple functions on X, that is, the space of functions from X to R
n
expressible as k=0 ak Ek where ak R and Ek for every k n. Then for every f L = L ()
and every > 0, there is a g S such that ess sup |f g| .
(b) S = {f : f S} is a k k -dense linear subspace of L = L ().
(c) If (X, , ) is totally finite, then S is the space of -simple functions, so S becomes just the space of
equivalence classes of simple functions, as in 242Mb.
proof (a) Let f : X R be a bounded measurable function such that f =a.e. f. Let n N be such that
|f (x)| n for every x X. For n k n set
Ek = {x : k f(x) < k + 1).
Set
Pn
g= k=n kEk S;
then 0 f(x) g(x) for every x X, so
ess sup |f g| = ess sup |f g| .
158 Function spaces 243I

(b) This follows immediately, as in 242Mb.


(c) is also elementary.

243J Conditional expectations Conditional expectations are so important that it is worth considering
their interaction with every new concept.

(a) If (X, , ) is any measure space, and T is a -subalgebra of , then the canonical embedding
S : L0 ( T) L0 () (242Ja) embeds L ( T) as a subspace of L (), and kSuk = kuk for every
u L ( T). As in 242Jb, we can identify L ( T) with its image in L ().

(b) Now suppose that X = 1, and let P : L1 () L1 ( T) be the conditional expectation operator
(242Jd).
R Then L () is actually a linear subspace of L1 (). Setting e = 1 L (), we see that
F
e = ( T)(F ) for every F T, so
P e = 1 L ( T).
If u L (), then setting M = kuk we have M e u M e, so M P e P u M P e, because P is
order-preserving (242Je); accordingly kP uk M = kuk . Thus P L () : L () L ( T) is an
operator of norm 1.

243K Complex L All the ideas needed to adapt the work above to complex L spaces have already
appeared in 241J and 242P. Let L
C be

{f : f L0C , ess sup |f | < } = {f : Re(f ) L , Im(f ) L }.


Then
L 0
C = {f : f LC } = {u : u LC , Re(u) L , Im(u) L }.

Setting
kuk = k|u|k = ess sup |f | whenever f = u,
we have a norm on L
C rendering it a Banach space. We still have u v LC and ku vk kuk kvk

for all u, v LC .
We now have a duality between L1C and L 1
C giving rise to a linear operator T : LC LC of norm at
most 1, defined by the formula
R
(T v)(u) = u v for every u L1 , v L .
T is injective iff the underlying measure space is semi-finite, and is a bijection iff the underlying measure
space is localizable. (This can of course be proved by re-working the arguments of 243G; but it is perhaps
easier to note that T (Re(v)) = Re(T v), T (Im(v)) = Im(T v) for every v, so that the result for complex
spaces can be deduced from the result for real spaces.) To check that T is norm-preserving when it is
injective, the quickest route seems to be to imitate the argument of (a-ii) of the proof of 243G.

243X Basic exercises (a) Let (X, , ) be any measure space, and the completion of (212C,
241Xb). Show that L () = L (), L () = L ().

> (b) Let (X, , ) be a non-empty measure space. Write L strict for the space of bounded -measurable
real-valued functions with domain X. (i) Show that L () = {f : f L 0 0
strict } L = L (). (ii) Show

that Lstrict is a Dedekind -complete Banach lattice if we give it the norm
kf k = supxX |f (x)| for every f L
strict .

(iii) Show that for every u L = L (), kuk = min{kf k : f L


strict , f = u}.

> (c) Let (X, , ) be any measure space, and A a subset of L (). Show that A is bounded for the norm
k k iff it is bounded above and below for the ordering of L .

(d) Let (X, , ) be any measure space, and A L () a non-empty set with a least upper bound w in

L (). Show that kwk supuA kuk .
243Y L 159

(e) Let h(Xi , i , i )iiI be a family of measure


Q spaces, and (X, , ) their direct sum (214K). Show that
the canonical isomorphism between L0 () and iI L0 (i ) (241Xg) induces an isomorphism between L ()
and the subspace
Q
{u : u iI L (i ), kuk = supiI ku(i)k < }
Q
of iI L (i ).

1
(f ) Let (X, ,
R ) be any measure space, and u L (). Show that there is a v L () such that
kvk 1 and u v = kuk1 .

(g) Let (X, , ) be a semi-finite measure space and v L (). Show that
R
kvk = sup{ u v : u L1 , kuk1 1} = sup{ku vk1 : u L1 , kuk1 1}.

(h) Give an example of a probability space (X, , ) and a v L () such that ku vk1 < kvk
whenever u L1 () and kuk1 1.

(i) Write out proofs of 243G adapted to the special cases (i) X = 1 (ii) (X, , ) is -finite.

(j) Let (X, , ) be any measure space. Show that L0 () is Dedekind complete iff L () is Dedekind
complete.

(k) Let (X, , ) be a totally finite measure space and : R a functional. Show that the following
are equiveridical: (i) there is a continuous linear functional h : L1 () R such that h((E) ) = E for
every E (ii) is additive and there is an M 0 such that |E| M E for every E .

> (l) Let X be any set, and let be counting measure on X. In this case it is customary to write ` (X)
for L (), and to identify it with L (). Write out statements and proofs of the results of this chapter
adapted to this special case if you like, with X = N. In particular, write out a direct proof that (`1 ) can
be identified with ` . What happens when X has just two members? or three?

(m) Show that if (X, , ) is any measure space and u L


C (), then

kuk = sup{k Re(u)k : C, || = 1}.

(n) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function.
Show that g L () for every g L (), and that the map g 7 g induces a linear operator T : L ()
L () defined by setting T (g ) = (g) for every g L (). (Compare 241Xh.) Show that kT vk = kvk
for every v L ().

(o) On C = C([0, 1]), the space of continuous real-valued functions on the unit interval [0, 1], say
f g iff f (x) g(x) for every x [0, 1],

kf k = supx[0,1] |f (x)|.
Show that C is a Banach lattice, and that moreover
kf gk = max(kf k , kgk ) whenever f , g 0,

kf gk kf k kgk for all f , g C,

kf k = min{ : |f | 1} for every f C.

243Y Further exercises (a) Let (X, , ) be a measure space, and Y a subset of X; write Y for the
subspace measure on Y . Show that the canonical map from L0 () onto L0 (Y ) (241Ye) induces a canonical
map from L () onto L (Y ), which is norm-preserving iff it is injective.
160 Function spaces 243 Notes

243 Notes and comments I mention the formula


ku vk = max(kuk , kvk ) for u, v 0
(243Df) because while it does not characterize L spaces among Banach lattices (see 243Xo), it is in a sense
dual to the characteristic property
ku + vk1 = kuk1 + kvk1 for u, v 0
1
of the norm of L . (I will return to this in Chapter 35 in the next volume.)
The particular set L I have chosen (243A) is somewhat arbitrary. The space L can very well be
described entirely as a subspace of L0 , without going back to functions at all; see 243Bc, 243Dc. Just as
with L0 and L1 , there are occasions when it would be simpler to work with the linear space of essentially
bounded measurable functions from X to R; and we now have a third obvious candidate, the linear space
Lstrict of measurable functions from X to R which are literally, rather than essentially, bounded, which is
itself a Banach lattice (243Xb).
I suppose the most important theorem of this section is 243G, identifying L with (L1 ) . This identifi-
cation is the chief reason for setting localizable measure spaces apart. The proof of 243Gb is long because
it depends on two separate ideas. The Radon-Nikodym theorem deals, in effect, with the totally finite case,
and then in parts (b-v) and (b-vi) of the proof localizability is used to link the partial solutions gF together.
Exercise 243Xi is supposed to help you to distinguish the two operations. The map T 0 : L1 (L ) (243Fc)
is also very interesting in its way, but I shall leave it for Chapter 36.
243G gives another way of looking at conditional expectation operators. If (X, , ) is a probability
space and T is a -subalgebra of , of course both and T are localizable, so L () can be identified
with (L1 ()) and L ( T) can be identified with (L1 ( T)) . Now we have the canonical embedding
S : L1 ( T) L1 () (242Jb) which is a norm-preserving linear operator, so gives rise to an adjoint
operator S 0 : L1 () L1 ( T) defined by the formula
(S 0 h)(v) = h(Sv) for all v L1 ( T), h L1 () .
Writing T : L () L1 () and T T : L ( T) L1 ( T) for the canonical maps, we get a map
1 0
Q = T T S T : L () L ( T), defined by saying that
R R R
Qu v = (T T Qu)(v) = (S 0 T u)(v) = (T v)(Su) = Su v = u v
whenever v L1 ( T) and u L (). But this agrees with the formula of 242L: we have
R R R R
Qu v = u v = P (u v) = P u v.
Because v is arbitrary, we must have Qu = P u for every u L (). Thus a conditional expectation
operator is, in a sense, the adjoint of the appropriate embedding operator.
The discussion in the last paragraph applies, of course, only to the restriction P L () of the conditional
expectation operator to the L space. Because is totally finite, L () is a subspace of L1 (), and the
real qualities of the operator P are related to its behaviour on the whole space L1 . P : L1 () L1 ( T)
can also be expressed as an adjoint operator, but the expression needs more of the theory of Riesz spaces
than I have space for here. I will return to this topic in Chapter 36.

244 Lp
Continuing with our tour of the classical Banach spaces, we come to the Lp spaces for 1 < p < . The
case p = 2 is more important than all the others put together, and it would be reasonable, perhaps even
advisable, to read this section first with this case alone in mind. But the other spaces provide instructive
examples and remain a basic part of the education of any functional analyst.

244A Definitions Let (X, , ) be any measure space, and p ]1, [. Write Lp = Lp () for the set of
functions f L0 = L0 () such that |f |p is integrable, and Lp () for {f : f Lp ()} L0 = L0 ().
Note that if f Lp , g L0 and f =a.e. g, then |f |p =a.e. |g|p so |g|p is integrable and g Lp ; thus
L = {f : f L0 , f Lp }.
p
244E Lp 161

Alternatively, we can define up whenever u L0 , u 0 by writing (f )p = (f p ) for every f L0 such


that f (x) 0 for every x dom f (compare 241I), and say that Lp = {u : u L0 , |u|p L1 ()}.

244B Theorem Let (X, , ) be any measure space, and p [1, ].


(a) Lp = Lp () is a linear subspace of L0 = L0 ().
(b) If u Lp , v L0 and |v| |u|, then v Lp . Consequently |u|, u v and u v belong to Lp for all u,
v Lp .
proof The cases p = 1, p = are covered by 242B, 242C and 243B; so I suppose that 1 < p < .
(a)(i) Suppose that f , g Lp = Lp (). If a, b R then |a + b|p 2p max(|a|p , |b|p ), so |f + g|p a.e.
2 (|f |p |g|p ); now |f + g|p L0 and 2p (|f |p |g|p ) L1 so |f + g|p L1 . Thus f + g Lp for all f , g Lp ;
p

it follows at once that u + v Lp for all u, v Lp .


(ii) If f Lp and c R then |cf |p = |c|p |f |p L1 , so cu Lp . Accordingly cu Lp whenever u Lp
and c R.
(b)(i) Express u as f and v as g , where f Lp and g L0 . Then |g| a.e. |f |, so |g|p a.e. |f |p and
|g| is integrable; accordingly g Lp and v Lp .
p

(ii) Now | |u| | = |u| so |u| Lp whenever u Lp . Finally uv = 12 (u+v+|uv|), uv = 12 (u+v|uv|)


belong to Lp for all u, v Lp .

244C The order structure of Lp Let (X, , ) be any measure space, and p [1, ]. Then 244B is
enough to ensure that the partial ordering inherited from L0 () makes Lp () a Riesz space (compare 242C,
243C).

244D The norm of Lp Let (X, , ) be a measure space, p ]1, [.

R
(a) For f Lp = Lp (), set kf kp = ( |f |p )1/p . If f , g Lp and f =a.e. g then |f |p =a.e. |g|p so
kf kp = kgkp . Accordingly we may define k kp :RLp () [0, [ by writing kf kp = kf kp for every f Lp .
Alternatively, we can say just that kukp = ( |u|p )1/p for every u Lp = Lp ().

(b) The notation k kp carries a promise that it is a norm on Lp ; this is indeed so, but I hold the proof
over to 244F below. For the
R moment, however, let us note just that kcukp = |c|kukp for all u Lp , c R,
and that if kukp = 0 then |u| = 0 so |u|p = 0 and u = 0.
p

(c) If |u| |v| in Lp then kukp kvkp ; this is because |u|p |v|p .

244E I now work through the lemmas required to show that k kp is a norm on Lp and, eventually, that
the normed space dual of Lp may be identified with a suitable Lq .
1 1
Lemma Suppose (X, , ) is a measure space, and that p, q ]1, [ are such that p + q = 1. Then
(a) ab p1 ap + 1q bq for all real a, b 0.
(b)(i) f g is integrable and
R R
f g |f g| kf kp kgkq
for all f Lp = Lp (), g Lq = Lq ();
(ii) u v L1 = L1 () and
R
| u v| ku vk1 kukp kvkq
for all u Lp = Lp (), v Lq = Lq ().
proof (a) If either a or b is 0, this is trivial. If both are non-zero, we may argue as follows. The function
x 7 x1/p : [0, [ R is concave, with second derivative strictly less than 0, so lies entirely below any of its
162 Function spaces 244E

tangents; in particular, below its tangent at the point (1, 1), which has equation y = 1 + p1 (x 1). Thus we
have
1 1 1 1
x1/p x + 1 = x+
p p p q

for every x [0, [. So if c, d > 0, then


c 1c 1
( )1/p + ;
d pd q
multiplying both sides by d,
1 1
c1/p d1/q c + d;
p q
setting c = ap , d = bq , we get
1 1
ab ap + bq ,
p q
as claimed.
(b)(i)() Suppose first that kf kp = kgkq = 1. For every x dom f dom g we have
1 1
|f (x)g(x)| |f (x)|p + |g(x)|q
p q

by (a). So
1 1
|f g| |f |p + |g|q L1 ()
p q
and f g is integrable; also
R 1R 1R 1 1 1 1
|f g| |f |p + |g|q = kf kpp + kgkqq = + = 1.
p q p q p q
R p p
() If kf kp = 0, then |f | = 0 so |f | =a.e. 0, f =a.e. 0, f g =a.e. 0 and
R
|f g| = 0 = kf kp kgkq .
Similarly, if kgkq = 0, then g =a.e. 0 and again
R
|f g| = 0 = kf kp kgkq .
() Finally, for general f Lp , g Lq such that c = kf kp and d = kgkq are both non-zero, we have
k 1c f kp = k d1 gkq = 1 so
1 1
f g = cd( f g)
c d
is integrable, and
R R 1 1
|f g| = cd | f g| cd,
c d
as required.
(ii) Now if u Lp , v Lq take f Lp , g Lq such that u = f , v = g ; f g is integrable, so
u v L1 , and
R R
| u v| ku vk1 = |f g| kf kp kgkq = kukp kvkq .

Remark Part (b) is Holders inequality. In the case p = q = 2 it is Cauchys inequality.

244F Proposition Let (X, , ) be a measure space and p ]1, [. Set q = p/(p1), so that p1 + 1q = 1.
R
(a) For every u Lp = Lp (), kukp = max{ u v : v Lq (), kvkq 1}.
(b) k kp is a norm on Lp .
proof (a) For u Lp , set
244G Lp 163
R
(u) = sup{ u v : v Lq (), kvkq 1}.

(i) If u Lp , then kukp (u), by 244E. If kukp = 0 then surely


R
0 = kukp = (u) = max{ u v : v Lq (), kvkq 1}.
If kukp = c > 0, consider
v = cp/q sgn u |u|p/q ,
where for a R I write sgn a = |a|/a if a 6= 0, 0 if a = 0, so that sgn : R R is a Borel measurable
function; for f L0 I write (sgn f )(x) = sgn(f (x)) for x dom f , so that sgn f L0 ; and for f L0 I write
sgn(f ) = (sgn f ) to define a function sgn : L0 L0 (cf. 241I). Then v Lq and
R R
kvkq = ( |v|q )1/q = cp/q ( |u|p )1/q = cp/q cp/q = 1.
So
Z Z
(u) u v = cp/q sgn u |u| sgn u |u|p/q
Z Z
1+ p p
=cp/q
|u| q =cp/q
|u|p = cp q = c,
p p
recalling that 1 + q = p, p q = 1. Thus (u) kukp and
R
(u) = kukp = u v.
p
(b) In view of the remarks in 244Db, I have only to check
R that ku + vkp kukp + kvkp for all u, v L .
q
But given u and v, let w L be such that kwkq = 1 and (u + v) w = ku + vkp . Then
R R R
ku + vkp = (u + v) w = u w + v w kukp + kvkp ,
as required.

244G Theorem Let (X, , ) be any measure space, and p [1, ]. Then Lp = Lp () is a Banach
lattice under its norm k kp .
proof The cases p = 1, p = are covered by 242F-242G and 243E, so let us suppose that 1 < p < . We
know already that kukp kvkp whenever |u| |v|, so that it remains only to show that Lp is complete.
Let hun inN be a sequence in Lp such that kun+1 un kp 4n for every n N. Note that
Pn1 P
kun kp ku0 kp + k=0 kuk+1 uk kp ku0 kp + k=0 4k ku0 kp + 2
for every n. For each n N, choose fn Lp such that f0 = u0 , fn = un un1 for n 1; do this in such a
way that dom fk = X and fk is -measurable (241Bk). Then kfn kp 4n+1 for n 1.
For m, n N, set
Emn = {x : |fm (x)| 2n } .
Then |fm (x)|p 2np for x Emn , so
R
2np Emn |fm |p <
and Emn < . So Emn Lq = Lq () and
R R
|f | = |fk | Emn kfk kp kEmn kq
Emn k

for each k, by 244E(b-i). Accordingly


P R P
k=0 Emn |fk | kEmn kq
k=0 kfk kp < ,
P
and S k=0 fk (x) exists for almost every x P Emn , by 242E. This is true for all m,Pn N. But if x

X \ m,nN Emn , fn (x) = 0 for every n, so k=0 fk (x) certainly exists. Thus g(x) = k=0 fk (x) is defined
in R for almost
Pnevery x X.
Set gn = k=0 fk ; then gn = un Lp for each n, and g(x) = limn gn (x) is defined a.e. in X. Now
consider |g|p =a.e. limn |gn |p . We know that
164 Function spaces 244G
R
lim inf n |gn |p = lim inf n kun kpp (2 + ku0 kp )p < ,
so by Fatous Lemma
R R
|g|p lim inf k |gk |p < .
Thus u = g Lp . Moreover, for any m N,
Z Z
|g gm |p lim inf |gn gm |p = lim inf kun um kpp
n n
n1
X
X
lim inf 4kp = 4kp = 4mp /(1 4p ).
n
k=m k=m
So
R
ku um kp = ( |g gm |p )1/p 4m /(1 4p )1/p 0
as m . Thus u = limm um in Lp . As hun inN is arbitrary, Lp is complete.

244H Following 242M-242O, I note that Lp behaves like L1 in respect of certain dense subspaces.
Proposition (a) Let (X, , ) be any measure space, and p [1, [. Then the space S of equivalence
classes of -simple functions is a dense linear subspace of Lp = Lp ().
(b) Let X be any subset of R r , where r 1, and let be the subspace measure on X induced by Lebesgue
measure on R r . Write Ck for the set of (bounded) continuous functions g : R r R such that {x : g(x) 6= 0}
is bounded, and S0 for the space of linear combinations of functions of the form I, where I R r is a
bounded half-open interval. Then {(gX) : g Ck } and {(hX) : h S0 } are dense in Lp ().
proof (a) I repeat the argument of 242M with a tiny modification.
(i) Suppose that u Lp (), u 0 and > 0. Express u as f where Rf : X
R [0, [ is a measurable
function. Let g : X R be a simple function such that 0 g f p and g f p p . Set h = g 1/p .
Then h is a simple function and h f . Because p > 1, (f h)p + hp f p and
R R
(f h)p f p g p ,
so
R
ku h kp = ( |f h|p )1/p ,
while h S.
(ii) For general u Lp , > 0, u can be expressed as u+ u where u+ = u 0, u = (u) 0 belong
to Lp and are non-negative. By (i), we can find v1 , v2 S such that ku+ v1 kp 21 , ku v2 kp 21 , so
that v = v1 v2 S and ku vkp . As u and are arbitrary, S is dense.
(b) Again, all the ideas are to be found in 242O; the changes needed are in the formulae, R notp in the
p
method.
R They will go more easilyR if I note at the outset that whenever f 1 , f 2 L () and |f1 | p ,
p p p p
|f2 | (where , 0), then |f1 + f2 | ( + ) ; this is just
R the triangle inequality
R for k kp (244Fb).
Also I will regularly express the target relationships in the form X |f g|p p , X |f g|p p . Now
let me run through the argument of 242Oa, rather more briskly than before.
(i) SupposeR first that f = IX where I R r is a bounded half-open interval. As before, we can set
h = I and get X |f h|p = 0. This time, use the same construction to R find an interval J and a function
g Ck such that I g J and r (J \ I) p ; this will ensure that X |f g|p p .
(ii) Now suppose that f = (X E) where E R r is a set of finite measure. Then, for Sthe same reasons
as before, there is a disjoint family I0 , . . . , In of half-open intervals such that r (E4 jn Ij ) ( 21 )p .
Pn R
Accordingly h = j=0 Ij S0 and X |f h|p ( 21 )p . And (i) tells us that there is for each j n a
R Pn R
gj Ck such that X |gj Ij |p (/2(n + 1))p , so that g = j=0 gj Ck and X |f g|p p .
(iii) The move toR simple functions, and thence to arbitrary members of Lp (), is just as before, but
using kf kp in place of X |f |. Finally, the translation from Lp to Lp is again direct remembering, as before,
to check that gX, hX belong to Lp () for every g Ck , h S0 .
244K Lp 165

*244I Corollary In the context of 244Hb, Lp () is separable.


proof Let A be the set
Pn
{( j=0 qj ([aj , bj [ X)) : n N, q0 , . . . , qn Q, a0 , . . . , an , b0 , . . . , bn Q r }.
Pn
Then A is a countable subset of Lp (), and its closure must contain ( j=0 cj ([aj , bj [ X)) whenever
c0 , . . . , cn R and a0 , . . . , an , b0 , . . . , bn R r ; that is, A is a closed set including {(hX) : h S0 }, and is
the whole of Lp (), by 244Hb.

244J Duality in Lp spaces Let (X, , ) be any measure space, and p ]1, [. Set q = p/(p 1);
note that p1 + 1q = 1 and that p = q/(q 1); the relation between p and q is symmetric. Now u v L1 ()
and ku vk1 kukp kvkq whenever u Lp = Lp () and v Lq = Lq () (244E). Consequently we have a
bounded linear operator T from Lq to the normed space dual (Lp ) of Lp , given by writing
R
(T v)(u) = u v
for all u Lp , v Lq , exactly as in 243F.

244K Theorem Let (X, , ) be a measure space, and p ]1, [; set q = p/(p 1). Then the canonical
map T : Lq () Lp () , described in 244J, is a normed space isomorphism.
Remark I should perhaps remind anyone who is reading this chapter to learn about L2 that the general
theory of Hilbert spaces yields this theorem in the case p = q = 2 without any need for the more generally
applicable argument given below (see 244N, 244Yj).
proof We know that T is a bounded linear operator of norm at most 1; I need to show (i) that T is actually
an isometry (that is, that kT vk = kvkq for every v Lq ), which will show incidentally that T is injective
(ii) that T is surjective, which is the really substantial part of the theorem.
q p
R (a) If v L , then by 244Fa (recalling that p = q/(q 1)) there is a u L such that kukp 1 and
u v = kvkq ; thus kT vk (T v)(u) = kvkq . As we know already that kT vk kvkq , we have kT vk = kvkq
for every v, and T is an isometry.
(b) The rest of the proof, therefore, will be devoted to showing that T : Lq (Lp ) is surjective. Fix
h (Lp ) with khk = 1.
I need to show that h lives on a countable union of sets of finite measure in X, in the following sense:
p
there is a non-decreasing
S sequence hEn inN of sets of finite measurepsuch that h(f ) = 0 whenever f L

and f (x) = 0 for x nN En . P P Choose a sequence hun inN in L such that kun kp 1 for every n and
limn h(un ) = khk = 1. For each n, express un as fn , where fn : X R is a measurable function. Set
Pn
En = {x : k=0 |fk (x)|p 2n }
for n N; because |fk |p is measurable and integrable and has domain X for every k, En and En <
for each n. S
Now suppose that f Lp (X) and that f (x) = 0 for x nN En ; set u = f Lp . ?? Suppose, if
possible, that h(u) 6= 0, and consider h(cu), where
sgn c = sgn h(u), 0 < |c| < (p |h(u)| kukp
p )
1/(p1)
.
(Of course kukp 6= 0 if h(u) 6= 0.) For each n, we have
S
{x : fn (x) 6= 0} mN Em {x : f (x) = 0},
so |fn + cf |p = |fn |p + |cf |p and
h(un + cu) kun + cukp = (kun kpp + kcukpp )1/p (1 + |c|p kukpp )1/p .
Letting n ,
1 + ch(u) (1 + |c|p kukpp )1/p .
Because sgn c = sgn h(u), ch(u) = |c||h(u)| and we have
1 + p|c||h(u)| (1 + ch(u))p 1 + |c|p kukpp ,
166 Function spaces 244K

so that
p|h(u)| |c|p1 kukpp < p|h(u)|
by the choice of c; which is impossible. X
X S
This means that h(f ) = 0 whenever f : X R is measurable, belongs to Lq , and is zero on nN En .
Q
Q
S
(c) Set Hn = En \ k<n Ek for each n N; then hHn inN is a disjoint sequence of sets of finite measure.
P
Now h(u) = n=0 h(u (Hn ) ) for everySu Lp . PP Express u as f , where f : X SR is measurable.
S Set
fn = f Hn for each n, g = f (X \ nN Hn ); then h(g ) = 0, by (a), because nN Hn = nN En .
Consider
Pn
gn = g + k=0 fk Lp
for each n. Then limn f gn = 0, and
|f gn |p |f |p L1
for every n, so by either Fatous Lemma or Lebesgues Dominated Convergence Theorem
R
limn |f gn |p = 0,
and
n
X
lim ku g u (Hk ) kp = lim ku gn kp
n n
k=0
Z
1/p
= lim |f gn |p = 0,
n

that is,
P
u = g + k=0 u Hk
in Lp . Because h : Lp R is linear and continuous, it follows that
P P
h(u) = h(g ) + k=0 h(u Hk ) = k=0 h(u Hk ),
as claimed. Q
Q
(d) For each n N, define n : R by setting
n E = h((E Hn ) )
for every E . (Note that n E is always defined because (E Hn ) < , so that
k(E Hn )kp = (E Hn )1/p < .)
Then n = h(0) = 0, and if hFk ikN is a disjoint sequence in ,
S Pm S
k( kN Hn Fk ) k=0 (Hn Fk )kp = (Hn k=m+1 Fk )1/p 0
as m , so
S P
n ( kN Fk ) = k=0 n Fk .
So n is countably additive. Further, |n F | (Hn F )1/p , so nR is truly continuous in the sense of 232Ab.
There is therefore an integrable function gn such that n F = F gn for every F ; let us suppose that
gn is measurable
S and defined on the whole of X. Set g(x) = gn (x) whenever n N and x Hn , g(x) = 0
for x X \ nN Hn .
P R
(e) g = n=0 gn Hn is measurable and R has the property that F g = h(F ) whenever n N and F
is a measurable
S subset of Hn ; consequentlyS F g = h(F ) whenever n N and F is a measurable subset of
En = kn Hk . Set G = {x : g(x) > 0} nN En . If F G and F < , then
R
limn g (F En ) supnN h((F En ) ) supnN k(F En )kp = (F )1/p ,
so by B. Levis theorem
244L Lp 167
R R R
F
g= g F = limn g (F En )
R R
exists. Similarly, F g exists if F R {x : g(x) < 0} has finite measure; while obviously F g exists if
F {x : g(x) = 0}. Accordingly F g exists for every set F of finite measure. Moreover, by Lebesgues
Dominated Convergence Theorem,
R R P
F
g = limn F En g = limn h((F En ) ) = n=0 h((F Hn ) ) = h(F )
for such F , by (c) above. It follows at once that
R
g f = h(f )
for every simple function f : X R.
(f ) Now g Lq . P
P (i) We already know that |g|q : X R is measurable, because g is measurable and
a 7 |a| is continuous. (ii) Suppose that f is a non-negative simple function and f a.e. |g|q . Then f 1/p is
q

a simple function, and


R sgnp g isR measurable andR takes only the values 0, 1 and 1, so f1 = f 1/p sgn g is
1/p
simple. We see that |f1 | = f , so kf1 kp = ( f ) . Accordingly

Z Z Z
( f )1/p h(f1 ) = g f1 = |g f 1/p |
Z
f 1/q f 1/p

(because 0 f 1/q a.e. |g|)


Z
= f,

R
and we must have f 1. (iii) Thus
R
sup{ f : f is a non-negative simple function, f a.e. |g|q } 1 < .
But now observe that if > 0 then
S
{x : |g(x)|q } = nN {x : x En , |g(x)|q },
and for each n N
{x : x En , |g(x)|q } 1 ,
because f = {x : x En , |g(x)|q } is a simple function less than or equal to |g|q , so has integral at
most 1. Accordingly
1
{x : |g(x)|q } = supnN {x : x En , |g(x)|q } < .
q
Thus |g| is integrable, by the criterion in 122Ja. Q
Q
(g) We may therefore speak of h1 = T (g ) (Lp ) , and we know that it agrees with h on members of Lp
of the form f where f is a simple function. But these form a dense subset of Lp , by 244Ha, and both h and
h1 are continuous, so h = h1 is a value of T , by 2A3Uc. The argument as written so far has assumed that
khk = 1. But of course any non-zero member of (Lp ) is a scalar multiple of an element of norm 1, so is a
value of T . So T : Lq (Lp ) is indeed surjective, and is therefore an isometric isomorphism, as claimed.

244L Continuing with the same topics as in 242 and 243, I turn to the order-completeness of Lp .
Theorem Let (X, , ) be any measure space, and p [1, [. Then Lp = Lp () is Dedekind complete.
proof I use 242H. Let A Lp be any set which is bounded above in Lp . Fix u0 A and set
A0 = {u0 u : u A},
so that A0 has the same upper bounds as A and is bounded below by u0 . Fixing an upper bound w0 of A
in Lp , then u0 u w0 for every u A0 . Set
B = {(u u0 )p : u A0 }.
168 Function spaces 244L

Then
0 v (w0 u0 )p L1 = L1 ()
for every v B, so B is a non-empty subset of L1 which is bounded above in L1 , and therefore has a
1/p 1/p
least upper bound v1 in L1 . Now v1 Lp ; consider w1 = u0 + v1 . If u A0 then (u u0 )p v1 so
1/p
u u0 v1 and u w1 ; thus w1 is an upper bound for A0 . If w Lp is an upper bound for A0 , then
u u0 w u0 and (u u0 )p (w u0 )p for every u A0 , so (w u0 )p is an upper bound for B and
1/p
v1 (w u0 )p , v1 w u0 and w1 w. Thus w = sup A0 = sup A in Lp . As A is arbitrary, Lp is
Dedekind complete.

244M As in the last two sections, the theory of conditional expectations is worth revisiting.
Theorem Let (X, , ) be a probability space, and T a -subalgebra of . Take p [1, ]. Regard L0 ( T)
as a subspace of L0 (), so that Lp ( T) becomes a subspace of Lp () (cf. 242Jb). Let P : L1 () L1 ( T)
be the conditional expectation operator, as described in 242Jd. Then whenever u Lp (), |P u|p P (|u|p ),
so P u Lp ( T) and kP ukp kukp .
proof For p = , this is 243Jb, so I assume henceforth that p < . Set (t) = |t|p for t R; then is
a convex function (because it is absolutely continuous on any bounded interval, and its derivative is non-
decreasing), and |u|p = (u) for every u L0 = L0 (), where is defined as in 241I. Now if u Lp = Lp (),
we surely have u L1 (because |u| |u|p (X) , or otherwise); so 242K tells us that |P u|p P |u|p . But
this means that P u Lp L1 ( T) = Lp ( T), and
R R R
kP ukp = ( |P u|p )1/p ( P |u|p )1/p = ( |u|p )1/p = kukp ,
as claimed.

244N The space L2 (a) As I have already remarked, the really important function spaces are L0 , L1 ,
L2 and L . L2 has the special property of being an inner product space;
R if (X, , ) is any measure space
and u, v L2 () then u v L1 , by 244Eb, and we may write (u|v) = u v. This makes L2 a real inner
product space (because
(u1 + u2 |v) = (u1 |v) + (u2 |v), (cu|v) = c(u|v), (u|v) = (v|u),

(u|u) 0, u = 0 whenever (u|u) = 0


p
for all u, u1 , u2 , v L2 and c R) and its norm k k2 is the associated norm (because kuk2 = (u|u)
whenever u L2 ). Because L2 is complete (244G), it is a real Hilbert space. The fact that it may be
identified with its own dual (244K) can of course be deduced from this.
I will use the phrase square-integrable to describe functions in L2 .

(b) Conditional expectations take a special form in the case of L2 . Let (X, , ) be a probability space,
T a -subgalgebra of , and P : L1 = L1 () L1 ( T) L1 the corresponding conditional expectation
operator. Then P [L2 ] L2 , where L2 = L2 () (244M), so we have 2 2
R an operator P2 = P L from L to itself.1
2
Now P2 is an orthogonal
R projection and its kernel is {u : u L , F u = 0 for every F T}. P
P (i) If u L
then P u = 0 iff F u = 0 for every F T (cf. 242Je); so surely the kernel of P2 is the set described. (ii)
Since P 2 = P , P2 is also a projection; because P2 has norm at most 1 (244M), and is therefore continuous,
U = P2 [L2 ] = L2 ( T) = {u : u L2 , P2 u = u}, V = {u : P2 u = 0}
2 2
are closed linear subspaces of L such that U V = L . (iii) Now suppose that u U , v V . Then
P |v| L2 , so u P |v| L1 and P (u v) = u P v, by 242L. Accordingly
R R R
(u|v) = u v = P (u v) = u P v = 0.
Thus U and V are orthogonal subspaces of L2 , which is what we mean by saying that P2 is an orthogonal
projection. (Some readers will know that every projection of norm at most 1 on an inner product space is
orthogonal.) Q
Q
244Xd Lp 169

244O Complex Lp Let (X, , ) be any measure space.

(a) For any p ]1, [, set


LpC = LpC () = {f : f L0C (), |f |p is integrable},

LpC () = {f : f Lp }
= {u : u L0C (), Re(u) Lp () and Im(u) Lp ()}
= {u : u L0C , |u| Lp }.
R
Then LpC is a linear subspace of L0C . Set kukp = ( |u|p )1/p for u LpC .

(b) The proof of 244E(b-i) applies unchanged to complex-valued functions, so taking q = p/(p 1) we get
ku vk1 kukp kvkq
for all u LpC ,v LqC .
244Fa becomes
for every u LC there is a v LqC such that kvkq 1 and
p
R R
u v = | u v| = kukp ;
the same proof works, if you allow me to write sgn a = |a|/a for all non-zero complex numbers it would
perhaps be more natural to write sgn(a) in place of sgn a. So, just as before, we find that k kp is a norm.
We can use the argument of 244G to show that LpC is complete. The space SC of equivalence classes of
complex-valued simple functions is dense in LpC . If X is a subset of R r and is Lebesgue measure on X,
then the space of equivalence classes of continuous complex-valued functions on X with bounded support is
dense in LpC .
R
(c) The canonical map T : LqC (LpC ) , defined by writing (T v)(u) = u v, is surjective because
T Lq : Lq (Lp ) is surjective; and it is an isometry by the remarks in (b) just above. Thus we can still
identify LqC with (LpC ) .

(d) When we come to the complex form of Jensens inequality, it seems that a new idea is needed. I have
relegated this to 242Yj-242Yk. But for the complex form of 244M a simpler argument will suffice. If we
have a probability space (X, , ), a -subalgebra T of , and the corresponding conditional expectation
operator P : L1C () L1C ( T), then for any u LpC () we shall have
|P u|p (P |u|)p P (|u|p ),
applying 242Pc and 244M. So kP ukp kukp , as before.

(e) There is a special point arising with L2C . We now have to define
R
(u|v) = u v
R
for u, v L2C , so that (u|u) = |u|2 = kuk22 for every u; this means that (v|u) is the complex conjugate of
(u|v).

244X Basic exercises > (a) Let (X, , ) be a measure space, and (X, , ) its completion. Show that
Lp () = Lp () and Lp () = Lp () for every p [1, ].

(b) Let (X, , ) be a measure space, and 1 p r . Set e = 1 in L0 (). (i) Show that if u Lp ()
and |u| e then u Lr () and kukr kukp . (Hint: look first at the case kukp = 1.) (ii) Show that if
v Lr () then (v e)+ Lp () and k(v e)+ kp kvkr .

(c) Let (X, , ) be a measure space, and 1 p q r . Show that Lp () Lr () Lq ()


L () + Lr () L0 (). (See also 244Yg.)
p

(d) Let (X, , ) be a measure space. Suppose that p, q, r [1, ] and that p1 + 1q = 1r , setting
1
= 0 as
r p q
usual. Show that u v L () and ku vkr kukp kvkq for every u L (), v L (). (Hint: if r <
apply Holders inequality to |u|r Lp/r , |v|r Lq/r .)
170 Function spaces 244Xe

(e) (i) Let (X, , ) be a probability space. Show that


R if 1 p r then kf kp kf kr for every
f Lr (). (Hint: use Holders inequality to show that |f |p k|f |p kr/p .) In particular, Lp () Lr ().
(ii) Let (X, , ) be a measure space such that E 1 whenever E and E > 0. (This happens,
for instance, when is counting measure on X.) Show that if 1 p r then Lp () Lr () and
kukp kukr for every u Lp (). (Hint: 244Xb.)

(f ) Let (X, , ) be a semi-finite measure space, and p, q [1, ] such that p1 + 1q = 1. Show that if
u L0 () \ Lp () then there is a v Lq () such that u v
/ L1 (). (Hint: reduce to the case u 0.
n q
Show that in this
R case therenis for each n PNa un u such that 4 kun kp < ; take vn L such that
n
kvn kq 2 , un vn 2 , and set v = n=0 vn .)

(g) Let h(Xi , i , i )iiI be a family of measure spaces, and (X, , ) their
Q direct sum (214K). Take any
p [1, [. Show that the canonical isomorphism between L0 () and iI L0 (i ) (241Xg) induces an
isomorphism between Lp () and the subspace
Q P
{u : u iI Lp (i ), kuk = p 1/p
iI ku(i)kp ) < }
Q p
of iI L (i ).

(h) Let (X, , ) be a measure space. Set M ,1 = L1 () L (). Show that for u M ,1 the function
p 7 kukp : [1, [ [0, [ is continuous, and that kuk = limp kukp . (Hint: consider first the case in
which u is the equivalence class of a simple function.)

(i) Let be counting measure on X = {1, 2}, so that L0 () = R 2 and Lp () = L0 () can be identified
with R 2 for every p [1, ]. Sketch the unit balls {u : kukp 1} in R 2 for p = 1, 23 , 2, 3 and .

(j) Let be counting measure on X = {1, 2, 3}, so that L0 () = R 3 and Lp () = L0 () can be identified
with R 3 for every p [1, ]. Describe the unit balls {u : kukp 1} in R 3 for p = 1, 2 and .

(k) At which point does the argument of 244Hb break down if we try to apply it to L with k k ?

(l) For any measure space (X, , ) write M 1, = M 1, () for {v + w : v L1 (), w L ()} L0 ().
Show that M 1, is a linear subspace of L0 including Lp for every p [1, ], and that if u L0 , v M 1,
and |u| |v| then u M 1, . (Hint: u = v w where |w| 1 .)

(m) Let (X, , ) and (Y, T, ) be two measure spaces, and let T + be the set of linear operators T :
M 1, () M 1, () such that () T u 0 whenever u 0 in M 1, () () T u L1 () and kT uk1 kuk1
whenever u L1 () () T u L () and kT uk kuk whenever u L (). (i) Show that if : R R
is a convex function such that (0) = 0, and u M 1, () is such that (u) M 1, () (interpreting
: L0 () L0 () as in 241I), then (T u) M 1, () and (T u) T ((u)) for every T T + . (ii) Hence
show that if p [1, ] and u Lp (), T u Lp () and kT ukp kukp for every T T + .

> (n) Let X be any set, and let be counting measure on X. In this case it is customary (at least for
p [1, ]) to write `p (X) for Lp (), and to identify it with Lp (). In particular, L2 () becomes identified
with `2 (X), the space of square-summable functions on X. Write out statements and proofs of the results
of this chapter adapted to this special case.

(o) Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-preserving function.
Show that the map g 7 g : L0 () L0 () (241Xh) induces a norm-preserving map from Lp () to Lp ()
for every p [1, ], and also a map from M 1, () to M 1, () which belongs to the class T + of 244Xm.

244Y Further exercises (a) Let (X, , ) be a measure space, and (X, , ) its c.l.d. version. Show
that Lp () Lp () and that this embedding induces a Banach lattice isomorphism between Lp () and
Lp (), for every p [1, [.

(b) Let (X, , ) be any measure space, and p [1, [. Show that Lp () has the countable sup property
in the sense of 241Yd. (Hint: 242Yd.)
244 Notes Lp 171

(c) Let (X, , ) be a measure space, and Y a subset of X; write Y for the subspace measure on Y .
Show that the canonical map T from L0 () onto L0 (Y ) (241Ye) includes a surjection from Lp () onto
Lp (Y ) for every p [1, ], and also a map from M 1, () to M 1, (Y ) which belongs to the class T +
of 244Xm. Show that the following are equiveridical: (i) there is some p [1, [ such that T Lp () is
injective; (ii) T : Lp () Lp (Y ) is norm-preserving for every p [1, [; (iii) F Y 6= whenever F
and 0 < F < .

(d) Let (X, , ) be any measure space, and p [1, [. Show that the norm k kp on Lp () is order-
continuous in the sense of 242Yc.

(e) Let (X, , ) be any measure space, and p [1, ]. Show that if A Lp () is upwards-directed and
norm-bounded, then it is bounded above. (Hint: 242Yb.)

(f ) Let (X, , ) be any measure space, and p [1, ]. Show that if a non-empty set A Lp () is
upwards-directed and has a supremum in Lp (), then k sup Akp supuA kukp . (Hint: consider first the
case 0 A.)

(g) Let (X, , ) be a measure space and u L0 (). Show that I = {p : p [0, [ , u Lp ()} is an
interval. Give examples to show that it may be open, closed or half-open. Show that p 7 p ln kukp : I R
is convex. Hence show that if p q r I, kukq max(kukp , kukr ).

(h) Let [a, b] be a non-trivial closed interval in R and F : [a, b] R a function; take p ]1, [. Show
that the following are equiveridical: (i) F is absolutely continuous and its derivative F 0 belongs to Lp (),
where is Lebesgue measure on [a, b] (ii)
Pn |F (ai )F (ai1 )|p
c = sup{ i=1 : a a0 < a1 < . . . < an b}
(ai ai1 )
p1

is finite, and that in this case c = kF kp . (Hint: (i) if F is absolutely continuous and F 0 Lp , use Holders
0
R b0
inequality to show that |F (b0 ) F (a0 )|p (b0 a0 )p1 a0 |F 0 |p whenever a a0 b0 b. (ii) If F satisfies
Pn Pn
the conditions, show that ( i=0 |F (bi ) F (ai )|)p c( i=0 (bi ai ))p1 whenever a a0 b0 a1 . . .
bn b, so that F is absolutely continuous. Take aR sequence hFn inN R of polygonal functions approximating
F ; use 223Xh to show that Fn0 F 0 a.e., so that |F 0 |p supnN |Fn0 |p cp .)

(i) Let G be an open set in R r and write for Lebesgue measure on G. Let Ck (G) be the set of continuous
functions f : G R such that inf{kx yk : x G, f (x) 6= 0, y R r \ G} > 0 (counting inf as ). Show
that for any p [1, [ the set {f : f Ck (G)} is a dense linear subspace of Lp ().

(j) Let U be any Hilbert space. (i) Show that if C U is convex (that is, tu + (1 t)v C whenever
u, v C and t [0, 1]; see 233Xd) and closed, and u U , then there is a unique v C such that
ku vk = inf wC ku wk, and that (u v|v w) 0 for every w C. (ii) Show that if h U there is a
unique v U such that h(w) = (w|v) for every w U . (Hint: apply (i) with C = {w : h(w) = 1}, u = 0.)
(iii) Show that if V U is a closed linear subspace then there is a unique linear projection P on U such
that P [U ] = V and (u P u|v) = 0 for all u U , v V (P is orthogonal). (Hint: take P u to be the point
of V nearest to u.)

(k) Let (X, , ) be a probability space, and T a -subalgebra Rof . UseR part (iii) of 244Yj to show that
there is an orthogonal projection P : L2 () R L2 ( T) 2
R such that F P u = F u for every u L (), F T.
Show that P u 0 whenever u 0 and that P u = u for every u, so that P has a unique extension to a
continuous operator from L1 () onto L1 ( T). Use this to develop the theory of conditional expectations
without using the Radon-Nikodym theorem.

244 Notes and comments At this point I feel we must leave the investigation of further function spaces.
The next stage would have to be a systematic abstract analysis of general Banach lattices. The Lp spaces
give a solid foundation for such an analysis, since they introduce the basic themes of norm-completeness,
order-completeness and identification of dual spaces. I have tried in the exercises to suggest the importance
172 Function spaces 244 Notes

of the next layer of concepts: order-continuity of norms and the relationship between norm-boundedness and
order-boundedness. What I have not had space to discuss is the subject of order-preserving linear operators
between Riesz spaces, which is the key to understanding the order structure of the dual spaces here. (But
you can make a start by re-reading the theory of conditional expectation operators in 242J-242L, 243J and
244M.) All these topics are treated in Fremlin 74 and in Chapters 35 and 36 of the next volume.
I remember that one of my early teachers of analysis said that the Lp spaces (for p 6= 1, 2, ) had
somehow got into the syllabus and had never been got out again. I would myself call them classics, in the
sense that they have been part of the common experience of all functional analysts since functional analysis
began; and while you are at liberty to dislike them, you can no more ignore them than you can ignore Milton
if you are studying English poetry. Holders inequality, in particular, has a wealth of applications; not only
244F and 244K, but also 244Xd, 244Xe and 244Yh, for instance.
The Lp spaces, for 1 p , form a kind of continuum. In terms of the concepts dealt with here, there is
no distinction to be drawn between different Lp spaces for 1 < p < except the observation that the norm
of L2 is an inner product norm, corresponding to a Euclidean geometry on its finite-dimensional subspaces.
To discriminate between the other Lp spaces we need much more refined concepts in the geometry of normed
spaces.
In terms of the theorems given here, L1 seems closer to the middle range of Lp for 1 < p < than
L does; thus, for all 1 p < , we have Lp Dedekind complete (independent of the measure space

involved), the space S of equivalence classes of simple functions is dense in Lp (again, for every measure
space), and the dual (Lp ) is (almost) identifiable as another function space. All of these should be regarded
as consequences in one way or another of the order-continuity of the norm of Lp for p < . The chief
obstacle to the universal identification of (L1 ) with L is that for non--finite measure spaces the space
L can be inadequate, rather than any pathology in the L1 space itself. (This point, at least, I mean to
return to in Volume 3.) There is also the point that for a non-semi-finite measure space the purely infinite
sets can contribute to L without any corresponding contribution to L1 . For 1 < p < , neither of these
problems can arise. Any member of any such Lp is supported entirely by a -finite part of the measure
space, and the same applies to the dual see part (c) of the proof of 244K.
Of course L1 does have a markedly different geometry from the other Lp spaces. The first sign of this
is that it is not reflexive as a Banach space (except when it is finite-dimensional), whereas for 1 < p <
the identifications of (Lp ) with Lq and of (Lq ) with Lp , where q = p/(p 1), show that the canonical
embedding of Lp in (Lp ) is surjective, that is, that Lp is reflexive. But even when L1 is finite-dimensional
the unit balls of L1 and L are clearly different in kind from the unit balls of Lp for 1 < p < ; they have
corners instead of being smoothly rounded (244Xi-244Xj).
The proof of 244K, identifying (Lp ) , is a fairly long haul, and it is natural to ask whether we really
have to work so hard, especially since in the case of L2 we have a much easier argument (244Yj). Of course
we can go faster if we know a bit more about Banach lattices (369 in Volume 3 has the relevant facts),
though this route uses some theorems quite as hard as 244K as given. There are alternative routes using
the geometry of the Lp spaces, following the ideas of 244Yj, but I do not think they are any easier, and the
argument I have presented here at least has the virtue of using some of the same ideas as the identification
of (L1 ) in 243G. The difference is that whereas in 243G we may have to piece together a large family of
functions gF (part (b-v) of the proof), in 244K there are only countably many gn ; consequently we can make
the argument work for arbitrary measure spaces, not just localizable ones.
The geometry of Hilbert space gives us an approach to conditional expectations which does not depend
on the Radon-Nikodym theorem (244Yk). To turn these ideas into a proof of the Radon-Nikodym theorem
itself, however, requires qualities of determination and ingenuity which can be better employed elsewhere.
The convexity arguments of 233J/242K can be used on many operators besides conditional expectations
(see 244Xm). The class T + described there is not in fact the largest for which these arguments work; I
take the ideas farther in Chapter 37. There is also a great deal more to be said if you put an arbitrary pair
of Lp spaces in place of L1 and L in 244Xl. 244Yg is a start, but for the real thing (the Riesz convexity
theorem) I refer you to Zygmund 59, XII.1.11 or Dunford & Schwartz 57, VI.10.11.
245Bb Convergence in measure 173

245 Convergence in measure


I come now to an important and interesting topology on the spaces L0 and L0 . I start with the definition
(245A) and with properties which echo those of the Lp spaces for p 1 (245D-245E). In 245G-245J I
describe the most useful relationships between this topology and the norm topologies of the Lp spaces. For
-finite spaces, it is metrizable (245Eb) and sequential convergence can be described in terms of pointwise
convergence of sequences of functions (245K-245L).

245A Definitions Let (X, , ) be a measure space.

(a) For any measurable set F X of finite measure, we have a functional F on L0 = L0 () defined by
setting
R
F (f ) = |f | F
for every f L0 . (The integral exists in R because |f |F belongs to L0 and is dominated by the integrable
function F ). Now F (f + g) F (f ) + F (g) whenever f , g L0 . P
P We need only observe that
min(|(f + g)(x)|, (F )(x)) min(|f (x)|, (F )(x)) + min(|g(x)|, (F )(x))
for every x dom f dom g, which is almost every x X. Q
Q Consequently, setting F (f, g) = F (f g),
we have
F (f, h) = F ((f g) + (g h)) F (f g) + F (g h) = F (f, g) + F (g, h),

F (f, g) = F (f g) 0,

F (f, g) = F (f g) = F (g f ) = F (g, f )
for all f , g, h L0 ; that is, F is a pseudometric.

(b) The family


{F : F , F < }
now defines a topology on L (2A3F); I will call it the topology of convergence in measure on L0 .
0

(c) If f , g L0 and f =a.e. g, then |f | F =a.e. |g| F and F (f ) = F (g), for every set F of finite
measure. Consequently we have functionals F on L0 = L0 () defined by writing
F (f ) = F (f )
whenever f L0 , F and F < . Corresponding to these we have pseudometrics F defined by either
of the formulae
F (u, v) = F (u v), F (f , g ) = F (f, g)
for u, v L0 , f , g L0 and F of finite measure. The family of these pseudometrics defines the topology
of convergence in measure on L0 .

(d) I shall allow myself to say that a sequence (in L0 or L0 ) converges in measure if it converges for
the topology of convergence in measure (in the sense of 2A3M).

245B Remarks (a) Of course the topologies of L0 , L0 are about as closely related as it is possible for
them to be. Not only is the topology of L0 the quotient of the topology on L0 (that is, a set G L0 is open
iff {f : f G} is open in L0 ), but every open set in L0 is the inverse image under the quotient map of an
open set in L0 .

(b) It is convenient to note that if F0 , . . . , Fn are measurable sets of finite measure with union F , then,
in the notation of 245A, Fi F for every i; this means that a set G L0 is open for the topology of
convergence in measure iff for every f G we can find a single set F of finite measure and a > 0 such that
F (g, f ) = g G.
174 Function spaces 245Bb

Similarly, a set G L0 is open for the topology of convergence in measure iff for every u G we can find a
set F of finite measure and a > 0 such that
F (v, u) = v G.

(c) The phrase topology of convergence in measure agrees well enough with standard usage when
(X, , ) is totally finite. But a warning! the phrase topology of convergence in measure is also used
for the topology defined by the metric of 245Ye below, even when X = . I have seen the phrase local
convergence in measure used for the topology of 245A. Most authors ignore non--finite spaces in this
context. However I hold that 245D-245E below are of sufficient interest to make the extension worth while.

245C Pointwise convergence The topology of convergence in measure is almost definable in terms
of pointwise convergence, which is one of the roots of measure theory. The correspondence is closest in
-finite measure spaces (see 245K), but there is still a very important relationship in the general case, as
follows. Let (X, , ) be a measure space, and write L0 = L0 (), L0 = L0 ().

(a) If hfn inN is a sequence in L0 converging almost everywhere to f L0 , then hfn inN f in measure.
P
P By 2A3Mc, I have only to show that limn F (fn , f ) = 0 whenever F < . But h|fn f | F inN
converges to 0 a.e. and is dominated by the integrable function F , so by Lebesgues Dominated Convergence
Theorem
R
limn F (fn , f ) = limn |fn f | F = 0. Q Q

(b) To formulate a corresponding result applicable to L0 , we need the following concept. If hfn inN ,
hgn inN are sequences in L0 such that fn = gn for every n, and f , g L0 are such that f = g , and
hfn inN f a.e., then hgn inN g a.e., because
\ \
{x : x dom f dom g dom fn gn ,
nN nN

g(x) = f (x) = lim fn (x), fn (x) = gn (x) n N}


n

is conegligible. Consequently we have a definition applicable to sequences in L0 ; we can say that, for f ,
fn L0 , hfn inN is order*-convergent, or order*-converges, to f iff f =a.e. limn fn . In this case,
of course, hfn inN f in measure. Thus, in L0 , a sequence hun inN which order*-converges to u L0 also
converges to u in measure.
Remark I suggest alternative descriptions of order-convergence in 245Xc; the conditions (iii)-(vi) there are
in forms adapted to more general structures.

(c) For a typical example of a sequence which is convergent in measure without being order-convergent,
consider the following. Take to be Lebesgue measure on [0, 1], and set fn (x) = 2m if x [2m k, 2m (k+1)],
0 otherwise, where k = k(n) N, m = m(n) N are defined by saying that n + 1 = 2m + k and
0 k < 2m . Then hfn inN 0 for the topology of convergence in measure (since F (fn , 0) 2m if
F [0, 1] is measurable and 2m 1 n), though hfn inN is not convergent to 0 almost everywhere (indeed,
lim supn fn = everywhere).

245D Proposition Let (X, , ) be any measure space.


(a) The topology of convergence in measure is a linear space topology on L0 = L0 ().
(b) The maps , : L0 L0 L0 , and u 7 |u|, u 7 u+ , u 7 u : L0 L0 are all continuous.
(c) The map : L0 L0 L0 is continuous.
(d) For any continuous function h : R R, the corresponding function h : L0 L0 (241I) is continuous.
proof (a) The point is that the functionals F , as defined in 245Ac, satisfy the conditions of 2A5B below.
P
P Fix a set F of finite measure. We have already seen that
F (u + v) F (u) + F (v) for all u, v L0 .
245D Convergence in measure 175

Next,
F (cu) F (u) whenever u L0 , |c| 1 ...(*)
because |cf | F a.e. |f | F whenever f L0 , |c| 1. Finally, given u L0 and > 0, let f L0 be
such that f = u. Then
limn |2n f | F =a.e. 0,
so by Lebesgues Dominated Convergence Theorem
R
limn F (2n u) = limn |2n f | F = 0,
and there is an n such that F (2n u) . It follows (by (*) just above) that F (cu) whenever |c| 2n .
As is arbitrary, limc0 F (u) = 0 for every u L0 ; which is the third condition in 2A5B. Q Q
Now 2A5B tells us that the topology defined by the F is a linear space topology.
(b) For any u, v L0 , ||u| |v|| |u v|, so F (|u|, |v|) F (u, v) for every set F of finite measure. By
2A3H, | | : L0 L0 is continuous. Now
1 1
u v = (u + v + |u v|), u v = (u + v |u v|),
2 2

u+ = u 0, u = (u) 0.
+
As addition and subtraction are continuous, so are , , and .
0
(c) Take u0 , v0 L and F a set of finite measure and > 0. Represent u0 and v0 as f0 , g0
respectively, where f0 , g0 : X R are -measurable (241Bk). If we set
Fm = {x : x F, |f0 (x)| + |g0 (x)| m},
then hFm imN is a non-decreasing sequence of sets with union F , so there is an m N such that (F \Fm )
1 2 1
2 . Let > 0 be such that (2m + F ) + 2 2 .
Now suppose that u, v L are such that F (u, u0 ) 2 and F (v, v0 ) 2 . Let f , g : X R be
0

measurable functions such that f = u and v = v. Then


{x : x F, |f (x) f0 (x)| } , {x : x F, |g(x) g0 (x)| } ,
so that
{x : x F, |f (x) f0 (x)||g(x) g0 (x)| 2 } 2
and
R
F
min(1, |f f0 | |g g0 |) 2 F + 2.
Also
|f g f0 g0 | |f f0 | |g g0 | + |f0 | |g g0 | + |f f0 | |g0 |,
so that
Z
F (u v, u0 v0 ) = min(1, |f g f0 g0 |)
F
Z
1
+ min(1, |f g f0 g0 |)
2 Fm
Z
1
+ min(1, |f f0 | |g g0 | + m|g g0 | + m|f f0 |)
2 Fm
Z
1
+ min(1, |f f0 | |g g0 |)
2 F
Z Z
+m min(1, |g g0 |) + m min(1, |f f0 |)
F F
1
+ 2 F 2
+ 2 + 2m .
2
176 Function spaces 245D

As F and are arbitrary, is continuous at (u0 , v0 ); as u0 and v0 are arbitrary, is continuous.


(d) Take u L0 , F of finite measure and > 0. Then there is a > 0 such that F (h(v), h(u))
whenever F (v, u) . P P?? Otherwise, we can find, for each n N, a vn such that F (v, u) 4n but
F (h(v), h(u)) > . Express u as f and vn as gn where f , gn : X R are measurable. Set
En = {x : x F, |gn (x) f (x)| 2n }
T S
for each n. Then F (vn , u) 2n En , so En 2n for each n, and E = nN mn Em is negligible.
But limn gn (x) = f (x) for every x F \ E, so (because h is continuous) limn h(gn (x)) = h(f (x)) for
every x F \ E. Consequently (by Lebesgues Dominated Convergence Theorem, as always)
R
limn F (h(vn ), h(u)) = limn F min(1, |h(gn (x)) h(f (x))|(dx) = 0,
which is impossible. X
XQQ
By 2A3H, h is continuous.
Remark I cannot say that the topology of convergence in measure on L0 is a linear space topology solely
because (on the definitions I have chosen) L0 is not in general a linear space.

245E I turn now to the principal theorem relating the properties of the topological linear space L0 ()
to the classification of measure spaces in Chapter 21.
Theorem Let (X, , ) be a measure space. Let T be the topology of convergence in measure on L0 = L0 (),
as described in 245A.
(a) (X, , ) is semi-finite iff T is Hausdorff.
(b) (X, , ) is -finite iff T is metrizable.
(c) (X, , ) is localizable iff T is Hausdorff and L0 is complete under T.
proof I use the pseudometrics F on L0 = L0 (), F on L0 described in 245A.
(a)(i) Suppose that (X, , ) is semi-finite and that u, v are distinct members of L0 . Express them as
f and g where f and g are measurable functions from X to R. Then {x : f (x) 6= g(x)} > 0 so, because

(X, , ) is semi-finite, there is a set F of finite measure such that {x : x F, f (x) 6= g(x)} > 0. Now
R
F (u, v) = F min(|f (x) g(x)|, 1)dx > 0
(see 122Rc). As u and v are arbitrary, T is Hausdorff (2A3L).
(ii) Suppose that T is Hausdorff and that E , E > 0. Then u = E 6= 0 so there is an F
such that F < and F (u, 0) 6= 0, that is, (E F ) > 0. Now E F is a non-negligible set of finite
measure included in E. As E is arbitrary, (X, , ) is semi-finite.
(b)(i) Suppose that (X, , ) is -finite. Let hEn inN be a non-decreasing sequence of sets of finite
measure covering X. Set

X
En (u, v)
(u, v) =
n=0
1 + 2n En

for u, v L0 . Then is a metric on L0 . P P Because every En is a pseudometric, so is . If (u, v) = 0,


express u as f , v as g where f , g L0 (); then
R
|f g| En = En (u, v) = 0,
S
so f = g almost everywhere on En , for every n. Because X = nN En , f =a.e. g and u = v. Q Q
If F and F < and > 0, take n such that (F \ En ) 21 . If u, v L0 and (u, v)
/2(1 + 2n En ), then F (u, v) . P P Express u as f , v = g where f , g L0 . Then
R
|u v| En = En (u, v) (1 + 2n En )(u, v) ,
2
while
R
|f g| (F \ En ) (F \ En ) ,
2
245E Convergence in measure 177

so
R R R
F (u, v) = |f g| F |f g| En + |f g| (F \ En ) + = . Q
Q
2 2

In the other direction, given > 0, take n N such that 2n 12 ; then (u, v) whenever En (u, v)
/2(n + 1).
These show that defines the same topology as the F (2A3Ib), so that T, the topology defined by the
F , is metrizable.
(ii) Suppose that T is metrizable. Let be a metric defining T. For each n N there must be a
measurable set Fn of finite measure and a n > 0 such that
Fn (u, 0) n = (u, 0) 2n .
S
Set E = X \ nN Fn . ?? If E is not negligible, then u = E 6= 0; because is a metric, there is an n N
such that (u, 0) > 2n ; now
(E Fn ) = Fn (u, 0) > n .
But E Fn = . X
X S
So E = 0 < . Now X = E nN Fn is a countable union of sets of finite measure, so is -finite.
(c) By (a), either hypothesis ensures that is semi-finite and that T is Hausdorff.
(i) Suppose that (X, , ) is localizable; then T is Hausdorff, by (a). Let F be a Cauchy filter on L0
(2A5F). For each measurable set F of finite measure, choose a sequence hAn (F )inN T of members of F such
that F (u, v) 4n for every u, v An (F ) and every n (2A5G). Choose uF n kn An (F ) for each n;
then F (uF,n+1 , uF n ) 2n for each n. Express each uF n as fF n where fF n is a measurable function from
X to R. Then
{x : x F, |fF,n+1 (x) fF n (x)| 2n } 2n F (uF,n+1 , uF n ) 2n
for each n. It follows that hfF n inN must converge almost everywhere on F . P
P Set
Hn = {x : x F, |fF,n+1 (x) fF n (x)| 2n }.
Then Hn 2n for each n, so
T S P
( nN mn Hm ) inf nN m=n 2m = 0.
T S S
If x F \ nN mn Hm , then there is some k such that x F \ mk Hm , so that |fF,m+1 (x) fF m (x)|
2m for every m k, and hfF n (x)inN is Cauchy, therefore convergent. Q Q
Set fF (x) = limn fF n (x) for every x F such that the limit is defined in R, so that fF is measurable
and defined almost everywhere in F .
If E, F are two sets of finite measure and E F , then E (uEn , uF n ) 2 4n for each n. P
P An (E) and
An (F ) both belong to F, so must have a point w in common; now

E (uEn , uF n ) E (uEn , w) + E (w, uF n )


E (uEn , w) + F (w, uF n )
4n + 4n . Q
Q
Consequently
{x : x E, |fF n (x) fEn (x)| 2n } 2n E (uF n , uEn ) 2n+1
for each n, and limn fF n fEn = 0 almost everywhere on E; so that fE = fF a.e. on E.
Consequently, if E and F are any two sets of finite measure, fE = fF a.e. on E F , because both are
equal almost everywhere on E F to fEF .
Because is localizable, it follows that there is an f L0 such that f = fE a.e. on E for every measurable
set E of finite measure (213N). Consider u = f L0 . For any set E of finite measure,
R R
E (u, uEn ) = E min(1, |f (x) fEn (x)|)dx = E min(1, |fE (x) fEn (x)|)dx 0
as n , using Lebesgues Dominated Convergence Theorem. Now
178 Function spaces 245E

inf sup E (v, u) inf sup E (v, u)


AF vA nN vAEn

inf sup (E (v, uEn ) + E (u, uEn ))


nN vAEn

inf (4n + E (u, uEn )) = 0.


nN

As E is arbitrary, F u. As F is arbitrary, L0 is complete under T.


(ii) Now suppose that L0 is complete under T and let E be any family of sets in . Let E 0 be
S
{ E0 : E0 is a finite subset of E}.
Then the union of any two members of E 0 belongs to E 0 . Let F be the set
{A : A L0 , E E 0 such that A AE },
where for E E 0 I write
AE = {F : F E 0 , F E}.
Then F is a filter on L0 , because AE AF = AEF for all E, F E 0 .
In fact F is a Cauchy filter. PP Let H be any set of finite measure and > 0. Set = supEE 0 (H E)
and take E E 0 such that (H E) . Consider AE F . If F , G E 0 and F E, G E then

H (F , G ) = (H (F 4G))
= (H (F G)) (H F G)
(H E) .
Thus H (u, v) for all u, v AE . As H and are arbitrary, F is Cauchy. Q Q
Because L0 is complete under T, F has a limit w say. Express w as h , where h : X R is measurable,
and consider G = {x : h(x) > 21 }.
?? If E E and (E\G) > 0, let F E\G be a set of non-zero finite measure. Then {u : F (u, w) < 12 F }
belongs to F, so meets AE ; let H E 0 be such that E H and F (H , w) < 21 F . Then
R 1
F
min(1, |1 h(x)|) = F (H , w) < F ;
2
1
but because F G = , 1 h(x) for every x F , so this is impossible. X
2 X
Thus E \ G is negligible for every E E.
Now suppose that H and (G \ H) > 0. Then there is an E E such that (E \ H) > 0. P P Let
F G \ H be a set of non-zero finite measure. Let u A be such that F (u, w) < 12 F . Then u is of the
form C for some C E 0 , and
R 1
F
min(1, |C(x) h(x)|) < F .
2
1
As h(x) for every x F , (C F ) > 0. But C is a finite union of members of E, so there is an E E
2
such that (E F ) > 0, and now (E \ H) > 0. Q Q
As H is arbitrary, G is an essential supremum of E in . As E is arbitrary, (X, , ) is localizable.

245F Alternative description of the topology of convergence in measure Let us return to ar-
bitrary measure spaces (X, , ).

(a) For any F of finite measure and > 0 define F : L0 [0, [ by


F (f ) = {x : x F dom f, |f (x)| > }
for f L0 , taking to be the outer measure associated with (132B). If f , g L0 and f =a.e. g, then
{x : x F dom f, |f (x)| > }4{x : x F dom g, |g(x)| > }
is negligible, so F (f ) = F (g); accordingly we have a functional from L0 to [0, [, given by
245H Convergence in measure 179

F (u) = F (f )
0 0
whenever f L and u = f L .

(b) Now F is not (except in trivial cases) subadditive, so does not define a pseudometric on L0 or L0 .
But we can say that, for f L0 ,
F (f ) min(1, ) = F (f ) = F (f ) (1 + F ).
(The point is that if E dom f is a measurable conegligible set such that f E is measurable, then
R
F (f ) = EF min(f (x), 1)dx, F (f ) = {x : x E F, f (x) > }.)
This means that a set G L0 is open for the topology of convergence in measure iff for every f G we
can find a set F of finite measure and , > 0 such that
F (g f ) = g G.
Of course F (f ) F (f ) whenever , so we can equally say:G L0 is open for the topology of
convergence in measure iff for every f G we can find a set F of finite measure and > 0 such that
F (g f ) = g G.
Similarly, G L is open for the topology of convergence in measure on L0 iff for every u G we can find
0

a set F of finite measure and > 0 such that


F (v u) = v G.

(c) It follows at once that a sequence hfn inN in L0 = L0 () converges in measure to f L0 iff
limn {x : x F dom f dom fn , |fn (x) f (x)| > } = 0
whenever F , F < and > 0. Similarly, a sequence hun inN in L0 converges in measure to u iff
limn F (u un ) = 0 whenever F < and > 0.

(d) In particular, if (X, , ) is totally finite, hfn inN f in L0 iff


limn {x : x dom f dom fn , |f (x) fn (x)| > } = 0
for every > 0, and hun inN u in L0 iff
limn X (u un ) = 0
for every > 0.

245G Embedding Lp in L0 : Proposition Let (X, , ) be any measure space. Then for any p
[1, ], the embedding of Lp = Lp () in L0 = L0 () is continuous for the norm topology of Lp and the
topology of convergence in measure on L0 .
proof Suppose that u, v Lp and that F < . Then (F ) belongs to Lq , where q = p/(p 1) (taking
q = 1 if p = , q = if p = 1 as usual), and
R
F (u, v) |u v| (F ) ku vkp kF kq
(244E). By 2A3H, this is enough to ensure that the embedding u 7 u : Lp L0 is continuous.

245H The case of L1 is so important that I go farther with it.


Proposition Let (X, , ) be a measure space.
1 1
R (a)(i) If f L = L () 1and R > R0, there are a > 0 and a set F of finite measure such that
|f g| whenever g L , |g| |f | + and F (f, g) . R
(ii) For any RsequenceRhfn inN in L1 and any f L1 , limn |f fn | = 0 iff hfn inN f in measure
and lim supn |fn | |f |.
(b)(i) If u L1 = L1 () and > 0, there are a > 0 and a set F of finite measure such that
ku vk1 whenever v L1 , kvk1 kuk1 + and F (u, v) .
180 Function spaces 245H

(ii) For any sequence hun inN in L1 and any u L1 , hun inN u for k k1 iff hun inN u in measure
and lim supn kun k1 kuk1 .
R
proof (a)(i) We know that there are a set F of finite measure and an > 0 such that RE |f | R51 whenever
(E F ) (225A). Take > 0 such that ( + 5F ) and 15 . Then if |g| |f | + and
F (f, g) , let G dom f dom g be a conegligible measurable set such that f G and gG are both
measurable. Set

E = {x : x F G, |f (x) g(x)| };
+5F
then

F (f, g) E,
+5F
R
so E . Set H = F \ E, so that (F \ H) and X\H |f | 15 . On the other hand, for almost every

R
x H, |f (x) g(x)| +5F , so H |f g| 15 and
R R 1 R R 1 R 2
H
|g| H
|f | |f | X\H
|f | |f | .
5 5 5
R R R
Since |g| |f | + 51 , X\H
|g| 35 . Now this means that
R R R R 3 1 1
|g f | X\H
|g| + X\H
|f | + H
|g f | + + = ,
5 5 5
as required.
R
(ii) If limn |f fn | = 0, that is, limn fn = f in L1 , then by 245G we must have hfn inN
R f

0
in
R L , that is, hfn inN f for the topology of convergence in measure. Also, of course, limn |fn | =
|f |. R R
In the other direction, if lim supn |fn | |f | and hfn inN f for the topology R of convergence
R
in measure, then whenever > 0 and F < there must R be an m N such that |fn | |f | + ,
F (f, fn ) for every m n; so (a) tells us that limn |fn f | = 0.
(b) This now follows immediately if we express u as f , v as g and un as fn .

245I Remarks (a) I think the phenomenon here is so important that it is worth looking at some
elementary examples.
(i) If
R is counting measure on N, and we set fn (n) = 1, fn (i) = 0 for i 6= n, then hfn inN 0 in measure,
while |fn | = 1 for every n.
(ii) If is Lebesgue measureRon [0, 1], and we set fn (x) = 2n for 0 < x 2n , 0 for other x, then again
hfn inN 0 in measure, while |fn | = 1 for every n. R
(iii) In 245Cc we have another sequence hfn inN converging to 0 in measure, while R |fn | = 1 for every
R n.
In all these cases (as required by Fatous Lemma, at least in (i) and (ii)) we have |f | lim inf n |fn |.
(The next proposition shows that this applies to any sequence which is convergent in measure.)
The common feature of these examples is the way in which somehow the fn escape to infinity, either
0 n
laterally (in (i)) or vertically (in (iii)) or both (in (ii)). Note that in all three
R 0examples we can set fn = 2 fn
to obtain a sequence still converging to 0 in measure, but with limn |fn | = .
R
(b) In 245H, I have used the explicit formulations limn |fn f | = 0 (for sequences of functions),
hun inN u for k k1 (for sequences in L1 ). These are often expressed by saying that hfn inN , hun inN are
convergent in mean to f , u respectively.

245J For semi-finite spaces we have a further relationship.


Proposition Let (X, , ) be a semi-finite measure R space. Write L0 = L0 (), etc.
(a)(i) For any a 0, the set {f : f L , |f | a} is closed in L0 for the topology of convergence in
1

measure. R
(ii) If hfn inN is a sequence
R in L1 which Ris convergent in measure to f L0 , and lim inf n |fn | < ,
then f is integrable and |f | lim inf n |fn |.
245L Convergence in measure 181

(b)(i) For any a 0, the set {u : u L1 , kuk1 a} is closed in L0 for the topology of convergence in
measure.
(ii) If hun inN is a sequence in L1 which is convergent in measure to u L0 , and lim inf n kun k1 < ,
then u L1 and kuk1 lim inf n kun k1 .
R
proof (a)(i) Set A = {f : f L1 , |f | a}, and let g be any member of the closure of A 0
R in L .
Let h be any simple function such that 0 h a.e. |g|, and > 0. If h = 0 then of course h a.
Otherwise, setting F = {x : h(x) > 0}, M = supxX h(x), there is an f A such that {x : x
F dom f dom g, |f (x) g(x)| } (245F); let E {x : x F dom fR dom g, |f (x) g(x)| }
be a measurable
R set of measure at most . Then h a.e. |f | + F + M E, so h a + (M + F ). As is
arbitrary,R h a. But we are supposing that is semi-finite, so this is enough to ensure that g is integrable
and that |g| a (213B), that is, that g A. As g is arbitrary, A is closed.
R
(ii) Now if hfn inN is convergent R in measure to f , and lim inf nN |fn | = a, then for any > 0 there is
a subsequence
R hfn(k) ikN such that R|fn(k) | a + for every k; since hfn(k) ikN still converges in measure
to f , |f | a + . As is arbitrary, |f | a.
(b) As in 245H, this is just a translation of part (a) into the language of L1 and L0 .

245K For -finite measure spaces, the topology of convergence in measure on L0 is metrizable, so can
be described effectively in terms of convergent sequences; it is therefore important that we have, in this case,
a sharp characterisation of sequential convergence in measure.
Proposition Let (X, , ) be a -finite measure space. Then
(a) a sequence hfn inN in L0 converges in measure to f L0 iff every subsequence of hfn inN has a
sub-subsequence converging to f almost everywhere;
(b) a sequence hun inN in L0 converges in measure to u L0 iff every subsequence of hun inN has a
sub-subsequence which order*-converges to u.
R
proof (a)(i) Suppose that hfn inN f , that is, that limn |f fn | F = 0 for every set F of finite
measure. Let hEk inN be a non-decreasing sequence
R of sets of finite measure covering X, and letP
hn(k)ikN

be a strictly increasing sequence in N such that |f fn(k) | Ek 4k for every k N. Then k=0 |f
P
fn(k) | Ek < a.e. (242E); but limk fn(k) (x) = f (x) whenever k=0 min(1, |f (x) fn(k) (x)|) < ,
so hfn(k) ikN f a.e.
(ii) The same applies to every subsequence of hfn inN , so that every subsequence of hfn inN has a
sub-subsequence converging to f almost everywhere.
(iii) Now suppose that hfn inN does not converge to f . Then there is an open set U containing f such
/ U } is infinite, that is, hfn inN has a subsequence hfn0 inN with fn0
that {n : fn / U for every n. But now
no sub-subsequence of hfn0 inN converges to f in measure, so no such sub-subsequence can converge almost
everywhere, by 245Ca.
(b) This follows immediately from (a) if we express u as f , un as fn .

245L Corollary Let (X, , ) be a -finite measure space.


(a) A subset A of L0 = L0 () is closed for the topology of convergence in measure iff f A whenever
f L0 and there is a sequence hfn inN in A such that f =a.e. limn fn .
(b) A subset A of L0 = L0 () is closed for the topology of convergence in measure iff u A whenever
u L0 and there is a sequence hun inN in A order*-converging to u.
proof (a)(i) If A is closed for the topology of convergence in measure, and hfn inN is a sequence in
A converging to f almost everywhere, then hfn inN converges to f in measure, so surely f A (since
otherwise all but finitely many of the fn would have to belong to the open set L0 \ A).
(ii) If A is not closed, there is an f A \ A. The topology can be defined by a metric (245Eb),
and we can choose a sequence hfn inN in A such that (fn , f ) 2n for every n, so that hfn inN f in
measure. By 245K, hfn inN has a subsequence hfn0 inN converging a.e. to f , and this witnesses that A fails
to satisfy the condition.
(b) This follows immediately, because A L0 is closed iff {f : f A} is closed in L0 .
182 Function spaces 245M

245M Complex L0 In 241J I briefly discussed the adaptations needed to construct the complex linear
space L0C . The formulae of 245A may be used unchanged to define topologies of convergence in measure on
L0C and L0C . I think that every word of 245B-245L still applies if we replace each L0 or L0 with L0C or L0C .
Alternatively, to relate the real and complex forms of 245E, for instance, we can observe that because

max(F (Re(u), Re(v)), F (Im(u), Im(v))) F (u, v)


F (Re(u), Re(v)) + F (Im(u), Im(v))

for all u, v L0 and all sets F of finite measure, L0C can be identified, as uniform space, with L0 L0 , so is
Hausdorff, or metrizable, or complete iff L0 is.

245X Basic exercises > (a) Let X be any set, and counting measure on X. Show that the topology
of convergence in measure on L0 () = RX is just the product topology on RX regarded as a product of
copies of R.

> (b) Let (X, , ) be any measure space, and (X, , ) its completion. Show that the topologies of
convergence in measure on L0 () = L0 () (241Xb), corresponding to the families {F : F , F < },
{F : F , F < } are the same.

>(c) Let (X, , ) be any measure space; set L0 = L0 (). Let u, un L0 for n N. Show that the
following are equiveridical:
(i) hun inN order*-converges to u in the sense of 245C;
(ii) there are measurable functions f , fn : X R such that f = u, fn = un for every n N, and
f (x) = limn fn (x) for every x X;
(iii) u = inf nN supmn um = supnN inf mn um , the infima and suprema being taken in L0 ;
(iv) inf nN supmn |u um | = 0 in L0 ;
(v) there is a non-increasing sequence hvn inN in L0 such that inf nN vn = 0 in L0 and u vn un
u + vn for every n N;
(vi) there are sequences hvn inN , hwn inN in L0 such that hvn inN is non-decreasing, hwn inN is non-
increasing, supnN vn = u = inf nN wn and vn un wn for every n N.

(d) Let (X, , ) be a semi-finite measure space. Show that a sequence hun inN in L0 = L0 () is order*-
convergent to u L0 iff {|un | : n N} is bounded above in L0 and hsupmn |um u|inN 0 for the
topology of convergence in measure.

(e) Write out proofs that L0 () is complete (as linear topological space) adapted to the special cases (i)
X = 1 (ii) is -finite, taking advantage of any simplifications you can find.

(f ) Let (X, , ) be a measure space and r 1; let h : Rr R be a continuous function. (i) Suppose
that for 1 k r we are given a sequence hfkn inN in L0 = L0 () converging in measure to fk L0 . Show
that hh(f1n , . . . , fkn )inN converges in measure to h(f1 , . . . , fk ). (ii) Generally, show that (f1 , . . . , fr ) 7
h(f1 , . . . , fr ) : (L0 )r L0 is continuous for the topology of convergence in measure. (iii) Show that the
corresponding function h : (L0 )r L0 (241Xd) is continuous for the topology of convergence in measure.
R
(g) Let (X, , ) be a measure space and u L1 (). Show that v 7 u v : L R is continuous for
the topology of convergence in measure on the unit ball of L , but not, as a rule, on the whole of L .

(h) Let (X, , ) be a measure space and v a non-negative member of L1 = L1 (). Show that on the
set A = {u : u L1 , |u| v} the subspace topologies (2A3C) induced by the norm topology of L1 and the
topology of convergence
R in measure are the same. (Hint: given > 0, take F of finite measure and
M 0 such that (|v| M F )+ . Show that ku u0 k1 + M F (u, u0 ) for all u, u0 A.)

(i) Let (X, , ) be a measure space and F a filter on L1 = L1 () which is convergent, for the topology of
convergence in measure, to u L1 . Show that F u for the norm topology of L1 iff inf AF supvA kvk1
kuk1 .
245Yh Convergence in measure 183

> (j) Let (X, , ) be a semi-finite measure space and p [1, ], a 0. Show that {u : u Lp (), kukp
a} is closed in L0 () for the topology of convergence in measure.

(k) Let (X, , ) be a measure space, and hun inN a sequence in Lp = Lp (), where 1 p < . Let
u Lp . Show that the following are equiveridical: (i) u = limn un for the norm topology of Lp (ii)
hun inN u for the topology of convergence in measure and limn kun kp = kukp (iii) hun inN u for
the topology of convergence in measure and lim supn kun kp kukp .

(l) Let X be a set and , two measures on X with the same measurable sets and the same negligible
sets. (i) Show that L0 () = L0 () and L0 () = L0 (). (ii) Show that if both and are semi-finite, then
they define the same topology of convergence in measure on L0 and L0 . (Hint: use 215A to show that if
E < then E = sup{F : F E, F < }.)

(m) Let (X, , ) be a measure space and p [1, [. Suppose that hun inN is a sequence in Lp () which
converges for k kp to u Lp (). Show that h|un |p inN |u|p for k k1 . (Hint: 245G, 245Xf, 245H.)

245Y Further exercises (a) Let (X, , ) be a measure space and give the topology described in
232Ya. Show that : L0 () is a homeomorphism between and its image [] in L0 , if L0 is given
the topology of convergence in measure and [] the subspace topology.

(b) Let (X, , ) be a measure space and Y any subset of X; let Y be the subspace measure on Y . Let
T : L0 () L0 (Y ) be the canonical map defined by setting T (f ) = (f Y ) for every f L0 () (241Ye).
Show that T is continuous for the topologies of convergence in measure on L0 () and L0 (Y ).

(c) Let (X, , ) be a measure space, and the c.l.d. version of . Show that the map T : L0 () L0 ()
induced by the inclusion L0 () L0 () (241Ya) is continuous for the topologies of convergence in measure.

(d) Let (X, , ) be a measure space, and give L0 = L0 () the topology of convergence in measure. Let
A L0 be a non-empty downwards-directed set, and suppose that inf A = 0 in L0 . (i) Let F be any
set of finite measure, and define F as in 245A; show that inf uA F (u) = 0. (Hint: set = inf uA F (u);
find a non-increasing sequence hun inN in A such that limn F (un ) = ; set v = (F ) inf nN un and
show that u v = v for every u A, so that v = 0.) (ii) Show that if U is any open set containing 0, there
is a u A such that v U whenever 0 v u.

(e) Let (X, , ) be a measure space. (i) Show that for u L0 = L0 () we may define a (u), for a 0,
by setting a (u) = {x : |f (x)| a} whenever f : X R is a measurable function and f = u. (ii) Define
: L0 L0 [0, 1] by setting (u, v) = min({1} {a : a 0, a (u v) a}. Show that is a metric on
L0 , that L0 is complete under , and that +, , , : L0 L0 L0 are continuous for . (iii) Show that
c 7 cu : R L0 is continuous for every u L0 iff (X, , ) is totally finite, and that in this case defines
the topology of convergence in measure on L0 .

(f ) Let (X, , ) be a localizable measure space and A L0 = L0 () a non-empty upwards-directed set


which is bounded in the linear topological space sense (i.e., such that for every neighbourhood U of 0 in L0
there is a k N such that A kU ). Show that A is bounded above in L0 , and that its supremum belongs
to its closure.

(g) Let (X, , ) be a measure space, p [1, [ and v a non-negative member of Lp = Lp (). Show that
on the set A = {u : u Lp , |u| v} the subspace topologies induced by the norm topology of Lp and the
topology of convergence in measure are the same.

(h) Let S be the set of all sequences s : N N such that limn s(n) = . For every s S,
let (Xs , s , s ) be [0, 1] with Lebesgue measure, and let (X, , ) be the direct sum of h(Xs , s , s )isS
(214K). For s S, t [0, 1], n N set hn (s, t) = fs(n) (t), where hfn inN is the sequence of 245Cc. Show
that hhn inN 0 for the topology of convergence in measure on L0 (), but that hhn inN has no subsequence
which is convergent to 0 almost everywhere.
184 Function spaces 245Yi

(i) Let X be a set, and suppose we are given a relation * between sequences in X and members of X
such that () if xn = x for every n then hxn inN * x () hx0n inN * x whenever hxn inN * x and hx0n inN
is a subsequence of hxn inN () if hxn inN * x and hxn inN * y then x = y. Show that we have a topology
T on X defined by saying that a subset G of X belongs to T iff whenever hxn inN is a sequence in X and
hxn inN * x G then some xn belongs to G. Show that a sequence hxn inN in X is T-convergent to x iff
every subsequence of hxn inN has a sub-subsequence hx00n inN such that hx00n inN * x.

(j) Let be Lebesgue measure on R r . Show that L0 () is separable for the topology of convergence in
measure. (Hint: 244I.)

245 Notes and comments In this section I am inviting you to regard the topology of (local) convergence
in measure as the standard topology on L0 , just as the norms define the standard topologies on Lp spaces
for p 1. The definition I have chosen is designed to make addition and scalar multiplication and the
operations , and continuous (245D); see also 245Xf. From the point of view of functional analysis
these properties are more important than metrizability or even completeness.
Just as the algebraic and order structure of L0 can be described in terms of the general theory of Riesz
spaces, the more advanced results 241G and 245E also have interpretations in the general theory. It is not
an accident that (for semi-finite measure spaces) L0 is Dedekind complete iff it is complete as uniform space;
you may find the relevant generalizations in 23K and 24E of Fremlin 74. Of course it is exactly because the
two kinds of completeness are interrelated that I feel it necessary to use the phrase Dedekind completeness
to distinguish this particular kind of order-completeness from the more familiar uniformity-completeness
described in 2A5F.
The usefulness of the topology of convergence in measure derives in large part from 245G-245J and the
Lp versions 245Xj and 245Xk. Some of the ideas here can be related to a question arising out of the basic
convergence theorems.
R If hfn inN is a sequence
R of integrable functions converging (pointwise) to a function f ,
in what ways can f fail to be limn fn ? In the language of this section, this translates into: if we have
a sequence (or filter) in L1 converging for the topology of convergence in measure, in what ways can it fail
to converge for the norm topology of L1 ? The first answer is Lebesgues Dominated Convergence Theorem:
this cannot happen if the sequence is dominated, that is, lies within some set of the form {u : |u| v}
where v L1 . (See 245Xh and 245Yg.) I will return to this in the next section. For the moment, though,
245H tells us that if hun inN converges in measure to u L1 , but not for the topology of L1 , it is because
lim supn kun k1 is too big; some of its weight is being lost at infinity, as in the examples of 245I. If hun inN
actually order*-converges to u, then Fatous Lemma tells us that lim inf n kun k1 kuk1 , that is, that
the limit cannot have greater weight (as measured by k k1 ) than the sequence provides. 245J and 245Xj are
generalizations of this to convergence in measure. If you want a generalization of B.Levis theorem, then
242Yb remains the best expression in the language of this chapter; but 245Yf is a version in terms of the
concepts of the present section.
In the case of -finite spaces, we have an alternative description of the topology of convergence in measure
(245L) which makes no use of any of the functionals or pseudo-metrics in 245A. This can be expressed, at
least in the context of L0 , in terms of a standard result from general topology (245Yi). You will see that that
result gives a recipe for a topology on L0 which could be applied in any measure space. What is remarkable
is that for -finite spaces we get a linear space topology.

246 Uniform integrability


The next topic is a fairly specialized one, but it is of great importance, for different reasons, in both
probability theory and functional analysis, and it therefore seems worth while giving a proper treatment
straight away.
246C Uniform integrability 185

246A Definition Let (X, , ) be a measure space.

(a) A set A L1 () is uniformly integrable if for every > 0 we can find a set E , of finite measure,
and an M 0 such that
R
(|f | M E)+ for every f A.

(b) A set A L1 () is uniformly integrable if for every > 0 we can find a set E , of finite measure,
and an M 0 such that
R
(|u| M E )+ for every u A.

246B Remarks (a) Recall the formulae from 241Ef: u+ = u 0, so (u v)+ = u u v.

(b) The phrase uniformly integrable is not particularly helpful. But of course we can observe that for
any particular integrable function f , there are simple functions approximating f for k k1 (242M), and such
functions will be bounded (in modulus) by functions of the form M E, with E < ; thus singleton subsets
of L1 and L1 are uniformly integrable. A general uniformly integrable set of functions is one in which M
and E can be chosen uniformly over the set.
(c) It will I hope be clear from the definitions that A L1 is uniformly integrable iff {f : f A} L1
is uniformly integrable.

(d) There is a useful simplification in the definition if X < (in particular, if (X, , ) is a probability
space). In this case a set A L1 () is uniformly integrable iff
R
inf M 0 supuA (|u| M e)+ = 0
iff
R
limM supuA (|u| M e)+ = 0,
1
R
writing
R e = 1 0 L+ (), where 1 is the

constant function with value 1. (For if supuA (|u| M E )+ ,
then (|u| M e) for every M 0 M .) Similarly, A L1 () is uniformly integrable iff
R
limM supf A (|f | M 1)+ = 0
iff
R
inf M 0 supf A (|f | M 1)+ = 0.

Warning! Some authors use the phrase uniformly integrable for sets satisfying the conditions in (d) even
when is not totally finite.

246C We have the following wide-ranging stability properties of the class of uniformly integrable sets
in L1 or L1 .
Proposition Let (X, , ) be a measure space and A a uniformly integrable subset of L1 ().
(a) A is bounded for the norm k k1 .
(b) Any subset of A is uniformly integrable.
(c) For any a R, aA = {au : u A} is uniformly integrable.
(d) There is a uniformly integrable C A such that C is convex and k k1 -closed, and v C whenever
u C and |v| |u|.
(e) If B is another uniformly integrable subset of L1 , then A B, A + B = {u + v : u A, v B} are
uniformly integrable.
proof Write f for {E : E , E < }.
R
(a) There must be E f , M 0 such that (|u| M E )+ 1 for every u A; now
R R
kuk1 (|u| M E )+ + M E 1 + M E
186 Function spaces 246C

for every u A, so A is bounded.


(b) This is immediate from the definition 246Ab.
f
R
R (c) Given >0,+ we can find E , M 0 such that |a| E
(|u| M E )+ for every u A; now
E
(|v| |a|M E ) for every v aA.
(d) If A is empty, take C = A. Otherwise, try
R R
C = {v : v L1 , (|v| w)+ supuA (|u| w)+ for every w L1 ()}.
Evidently A C, and C satisfies the definition 246Ab because A does, considering w of the form M E
where E f and M 0. The functionals
R
v 7 (|v| w)+ : L1 () R
R
are all continuous for k k1 (because the operators v 7 |v|, v 7 v w, v 7 v + , v 7 v are continuous), so
C is closed. If |v 0 | |v| and v C, then
R 0 R R
(|v | w)+ (|v| w)+ supuA (|u| w)+
for every w, and v 0 C. If v = av1 + bv2 where v1 , v2 C, a [0, 1] and b = 1 a, then |v| a|v1 | + b|v2 |,
so
|v| w (a|v1 | aw) + (b|v2 | bw) (a|v1 | aw)+ + (b|v2 | bw)+
and
(|v| w)+ a(|v1 | w)+ + b(|v2 | w)+
for every w; accordingly
Z Z Z
(|v| w)+ a (|v1 | w)+ + b (|v2 | w)+
Z Z
(a + b) sup (|u| w)+ = sup (|u| w)+
uA uA

for every w, and v C.


Thus C has all the required properties.
(e) I show first that A B is uniformly integrable. PP Given > 0, let M1 , M2 0 and E1 , E2 f be
such that
R
(|u| M1 E1 )+ for every u A,
R
(|u| M2 E2 )+ for every u B.
Set M = max(M1 , M2 ), E = E1 E2 ; then E < and
R
(|u| M E )+ for every u A B.
As is arbitrary, A B is uniformly integrable. Q
Q
Now (d) tells us that there is a convex uniformly integrable set C including A B, and in this case
A + B 2C, so A + B is also uniformly integrable, using (b) and (c).

246D Proposition Let (X, , ) be a probability space and A L1 () a uniformly integrable set.
Then there is a convex, k k1 -closed uniformly integrable set C L1 such that A C, w C whenever
v C and |w| |v|, and P v C whenever v C and P is the conditional expectation operator associated
with a -subalgebra of .
proof Set
R R
C = {v : v L1 (), (|v| M e)+ supuA (|u| M e)+ for every M 0},
writing e = 1 as usual. The arguments in the proof of 246Cd make it plain that C A is uniformly
integrable, convex and closed, and that w C whenever v C and |w| |v|. As for the conditional
246G Uniform integrability 187

expectation operators, if v C, T is a -subalgebra of , P is the associated conditional expectation


operator, and M 0, then
|P v| P |v| = P ((|v| M e) + (|v| M e)+ ) M e + P ((|v| M e)+ ),
so
(|P v| M e)+ P ((|v| M e)+ )
and
R R R R
(|P v| M e)+ P (|v| M e)+ = (|v| M e)+ supuA (|u| M e)+ ;
as M is arbitrary, P v C.

246E Remarks (a) Of course 246D has an expression in terms of L1 rather than L1 : if (X, , ) is
a probability space and A L1 () is uniformly integrable, then there is a uniformly integrable set C A
such that (i) af + (1 a)g C whenever f , g C and a [0, 1] (ii) g C whenever f RC, g L0 ()
and |g| a.e. |f | (iii) f C whenever there is a sequence hfn inN in C such that limn |f fn | = 0
(iv) g C whenever there is an f C such that g is a conditional expectation of f with respect to some
-subalgebra of .

(b) In fact, there are obvious extensions of 246D; the proof there already shows that T [C] C whenever
T : L1 () L1 () is an order-preserving linear operator such that kT uk1 kuk1 for every u L1 () and
kT uk kuk for every u L1 () L () (246Yc). If we had done a bit more of the theory of operators
on Riesz spaces I should be able to take you a good deal farther along this road; for instance, it is not in
fact necessary to assume that the operators T of the last sentence are order-preserving. I will return to this
in Chapter 37 in the next volume.

(c) Moreover, the main theorem of the next section will show that for any measure spaces (X, , ),
(Y, T, ), T [A] will be uniformly integrable in L1 () whenever A L1 () is uniformly integrable and
T : L1 () L1 () is a continuous linear operator (247D).

246F We shall need an elementary lemma which I have not so far spelt out.
Lemma Let (X, , ) be a measure space. Then for any u L1 (),
R
kuk1 2 supE | E u|.

proof Express u as f where f : X R is measurable. Set F = {x : f (x) 0}. Then


R R R R R
kuk1 = |f | = | F f | + | X\F f | 2 supE | E f | = 2 supE | E u|.

246G Now we come to some of the remarkable alternative descriptions of uniform integrability.
Theorem Let (X, , ) be any measure space and A a non-empty subset of L1 (). Then the following are
equiveridical:
(i) A is uniformly
R integrable;
(ii) supuA |
R F u| < for every -atom F and for every > 0 there are E , > 0 such that
E < and | F u| R whenever u A, F and (F E) ; R
(iii) supuA | F u| < for every -atom F and limn supuA | Fn u| = 0 whenever hFn inN is
a disjoint sequenceR in ; R
(iv) supuA | F u| < for every -atom F and limn supuA | Fn u| = 0 whenever hFn inN is
a non-increasing sequence in with empty intersection.
Remark I use the phrase -atom to emphasize that I mean an atom in the measure space sense (211I).
proof (a)(i)(iv) Suppose that A is uniformly integrable. Then surely if F is a -atom,
R
supuA | F u| supuA kuk1 < ,
188 Function spaces 246G

by 246Ca. Now suppose that hFn inN is a non-increasing R sequence in with empty intersection, and that
> 0. Take E , M 0 such that E < and (|u| M E )+ 12 whenever u A. Then for all n
large enough, M (Fn E) 12 , so that
R R R R
| Fn u| Fn |u| (|u| M E )+ + Fn M E + M (Fn E)
2
R
for every u A. As is arbitrary, limn supuA | Fn u| = 0, and (iv) is true.
R
(b)(iv)(iii) Suppose that (iv) is true. Then of course supuA | F u| < for every -atom F . ??
Suppose, if possible, that hFn inN is a disjoint sequence in such that
R
= lim supn supuA min(1, 13 | Fn u|) > 0.
S R
Set Hn = in Fi for each n, so that hHn inN is non-increasing and has empty intersection, and Hn u 0
as n for every u L1 (). ChooseR hni iiN , hmi iiN , hui iiN inductively, asR follows. n0 = 0. Given
ni N, take mi ni , ui A such that | Fm ui | 2. Take ni+1 > mi such that Hn |ui | . Continue.
S i i+1
Set Gk = ik Fmi for each k. Then hGk ikN is a non-increasing sequence in with empty intersection.
But Fmi Gi Fmi Hni+1 , so
R R R R
| Gi ui | | Fm ui | | Gi \Fm ui | 2 Hn |ui |
i i i+1

for every i, contradicting the hypothesis


R (iv). X X
This means that limn supuA | Fn u| must be zero, and (iii) is true.
R
(c)(iii)(ii) We still have supuA | F u| < for every -atom F . ?? Suppose, if possible, that there is
an > 0 such that for every
R measurable set E of finite measure and every > 0 there are u A, F such
that (F E) and | F u| . Choose a sequence hEn inN of sets of finite measure, a sequence hGn inN
in , a sequence hn inN of strictly positive real numbers and aSsequence hun inN in A as follows. Given uk ,
ERk , k for k < n, choose un A and Gn such that (Gn k<n Ek ) 2n R min({1} {k : k < n}) and
| Gn un | ; then choose a set En of finite measure and a n > 0 such that F |un | 12 whenever F
and (F En ) n (see 225A). Continue. S
On completing the induction, set Fn = En Gn \ k>n Gk for each n; then hFn inN is a disjoint sequence
in . By the choice of Gk ,
S P
(En k>n Gk ) k=n+1 2k n n ,
R R R
so (En (Gn \ Fn )) n and Gn \Fn |un | 21 . This means that | Fn un | | Gn un | 12 21 . But this
is contrary to the hypothesis (iii). XX

R (d)(ii)(i)( ) Assume (ii). Let > 0. Then there are E , R > 0 such that E < and
| F u| whenever u A, F and (F E) R. Now supuA E |u| < . P P Write I for the
family of
R those F such
R that F E and sup uA F |u| is finite. If F E is an atom for , then
supuA F |u| = supuA | F u| < , so F I. (The point is that if f : X R is a measurable function
0 00
such that
R f = u, thenRone of F

R = {x : x
R F, f (x)R 0}, F = {x : x F, f (x) < 0} must be negligible,
so that F |u| is either F 0 u = F u or F 00 u = F u.) If F , F E and F then
R R
supuA F |u| 2 supuA,G,GF | G u| 2
R R R
(by 246F), so F I. Next, if F , G I then supuA F G |u| S supuA F |u| + supuA G |u| is finite, so
F GS I. Finally, if hFSn inN is any sequence
S in I, and F = nN Fn , there is some n N such that
(F \ in Fi ) ; now in Fi and F \ in Fi both belong to I, so F I.
By 215Ab, there is an F I such that H \F is negligible for every H I. Now observe that E \F cannot
include any non-negligible member of I; in particular, cannot include either an atom or a non-negligible
set of measure less than . But this means that the subspace measure on E \ F is atomless, totally finite
and has no non-negligible measurable sets of measure less than ; by 215D, (E \ F ) = 0 and E \ F and E
belong toR I, as required. Q
Q R
Since X\E |u| for every u A, = supuA |u| is also finite.
) Set M = /. If u A, express u as f , where f : X R is measurable, and consider
(
246J Uniform integrability 189

F = {x : f (x) M E(x)}.
Then
R R
M (F E) F f = F u ,
R R
so (F E) /M = . Accordingly F u . Similarly, F 0 (u) , writing F 0 = {x : f (x)
M E(x)}. But this means that
R R R R
(|u| M E )+ = (|f | M E)+ F F 0 |f | = F F 0 |u| 2,
for every u A. As is arbitrary, A is uniformly integrable.

246H Remarks (a) Of course conditions (ii)-(iv) of this theorem, like (i), have direct R translations in
terms of members of L1 . Thus a non-empty set A L1 is uniformly integrable iff supf A | F f | is finite for
every atom F and
R
either for every > 0 we can find E , > 0 such that E < and | F f | whenever
f A, F and (F E)
R
or limn supf A | Fn f = 0| for every disjoint sequence hFn inN in
R
or limn supf A | Fn f | = 0 for every non-decreasing sequence hFn inN in with empty
intersection.

(b) There are innumerable further equivalent expressions characterizing uniform integrability; every au-
thor has his own favourite. Many of them are variants on (i)-(iv) of this theorem, as in 246I and 246Yd-246Yf.
For a condition of a quite different kind, see Theorem 247C.

246I Corollary Let (X, , ) be a probability space. For f L0 (), M 0 set F (f, M ) = {x : x
dom f, |f (x)| M }. Then a non-empty set A L1 () is uniformly integrable iff
R
limM supf A F (f,M ) |f | = 0.

proof (a) If A satisfies the condition, then


R R
inf M 0 supf A (|f | M X)+ inf M 0 supf A F (f,M ) |f | = 0,
so A is uniformly integrable.
R
(b) If A is uniformly Rintegrable, and > 0, there is an M0 0 such that (|f | M0 X)+ for every
f A; also, = supf A |f | is finite (246Ca). Take any M M0 max(1, (1 + )/). If f A, then

|f | F (f, M ) (|f | M0 X)+ + M0 F (f, M ) (|f | M0 X)+ + |f |
+1
everywhere on dom f , so
R R R
F (f,M )
|f | (|f | M0 X)+ + |f | 2.
+1
R
As is arbitrary, limM supf A F (f,M )
|f | = 0.

246J The next step is to set out some remarkable connexions between uniform integrability and the
topology of convergence in measure discussed in the last section.
Theorem Let (X, , ) be a measure space.
(a) If hfn inN is a uniformly integrable sequence of real-valued
R functions on X, and fR(x) = limn Rfn (x)
for almost every x X, then f is integrable and limn |fn f | = 0; consequently f = limn fn .
(b) If A L1 = L1 () is uniformly integrable, then the norm topology of L1 and the topology of
convergence in measure of L0 = L0 () agree on A.
(c) For any u L1 and any sequence hun inN in L1 , the following are equiveridical:
(i) u = limn un for k k1 ;
190 Function spaces 246J

(ii) {un : n N} is uniformly integrable and hun inN converges to u in measure.


(d) If (X, , ) is semi-finite, and A L1 is uniformly integrable, then the closure A of A in L0 for the
topology of convergence in measure is still a uniformly integrable subset of L1 .
R
proof (a) Note first that because sup R nN |fn | < (246Ca)
R and |f | = lim inf n |fn |, Fatous Lemma
assures us that |f | is integrable, with |f | lim supn |fn |. It follows immediately that {fn f : n N}
is uniformly integrable, being the sum of two uniformly integrable R sets (246Cc, 246Ce).
Given > 0, there are M 0, E such that E < and (|fn f | M E)+ for every n N.
Also |fn f | M E 0 a.e., so
Z Z
lim sup |fn f | lim sup (|fn f | M E)+
n n
Z
+ lim sup |fn f | M E
n
,
R R
by Lebesgues Dominated Convergence Theorem. As is arbitrary, limn |fn f | = 0 and limn fn
f = 0.
(b) Let TA , SA be the topologies on A induced by the norm topology of L1 and the topology of
convergence in measure on L0 respectively.
R
(i) Given > 0, let F , M 0 be such that F < and (|v| M F )+ for every v A,
and consider F , defined as in 245A. Then for any f , g L0 ,
|f g| (|f | M F )+ + (|g| M F )+ + M (|f g| F )
everywhere on dom f dom g, so
|u v| (|u| M F )+ + (|v| M F )+ + M (|u v| F )
for all u, v L0 . Consequently
ku vk1 2 + M F (u, v)
for all u, v A.
This means that, given > 0, we can find F , M such that, for u, v A,

F (u, v) = ku vk1 3.
1+M

It follows that every subset of A which is open for TA is open for SA (2A3Ib).
(ii) In the other direction, we have F (u, v) ku vk1 for every u L1 and every set F of finite
measure, so every subset of A which is open for SA is open for TA .
(c) If hun inN u for k k1 , A = {un : n N} is P
uniformly integrable. P
P Given > 0, let m be such
that kun uk1 whenever n m. Set v = |u| + im |ui | L1 , and let M 0, E be such that
R
E
(v M E )+ . Then, for w A,
(|w| M E )+ (|w| v)+ + (v M E )+ ,
so
R R
E
(|w| M E )+ k(|w| v)+ k1 + E
(v M E )+ 2. Q
Q
Thus on either hypothesis we can be sure that {un : n N} and A = {u} {un : n N} are uniformly
integrable, so that the two topologies agree on A (by (b)) and hun inN converges to u in one topology iff it
converges to u in the other.
(d) Because A is k k1 -bounded (246Ca)R and is semi-finite, A L1 (245J(b-i)). Given > 0, let
M 0, E be such that E < and (|u| M E )+ for every u A. Now the maps u 7 |u|,
u 7 u M E , u 7 u+ : L0 L0 are all continuous for the topology of convergence
R in measure (245D),
while {u : kuk1 } is closed for the same topology (245J again), so {u : u L0 , (|u| M E )+ } is
246Y Uniform integrability 191
R
closed and must include A. Thus (|u| M E )+ for every u A. As is arbitrary, A is uniformly
integrable.

246K Complex L1 and L1 The definitions and theorems above can be repeated without difficulty
for spaces of (equivalence classes of) complex-valued functions, with just one variation: in the complex
equivalent of 246F, the constant must be changed. It is easy to see that, for u L1C (),

kuk1 k Re(u)k1 + k Im(u)k1


Z Z Z
2 sup | Re(u)| + 2 sup | Im(u)| 4 sup | u|.
F F F F F F
R
(In fact, kuk1 supF | F u|; see 246Yl and 252Yo.) Consequently some of the arguments of 246G need
to be written out with different constants, but the results, as stated, are unaffected.

246X Basic exercises (a) Let (X, , ) be a measure space and A a subset of L1 = L1 (). Show that
the following
R are equiveridical: (i) A is uniformly integrable; (ii) for every > 0 there is a w 0 in L1
such that (|u| w) for every u A; (iii) h(|un+1 | supin |ui |)+ inN 0 in L1 for every sequence
+

hun inN in A. (Hint: for (ii)(iii), set vn = supin |ui | and note that hvn winN is convergent in L1 for
every w 0.)

> (b) Let (X, , ) be a totally finite measure space.


R Show that for any pR > 1, M 0 the set {f : f
Lp (), kf kp M } is uniformly integrable. (Hint: (|f | M X)+ M 1p |f |p .)

> (c) Let be counting measure on N. Show that a set A L1 () = `1 is uniformlyP integrable iff (i)
supf A |f (n)| < for every n N (ii) for every > 0 there is an m N such that n=m |f (n)| for
every f A.

(d) Let X be a set, and let be counting measure on X. Show that a set A L1 () = `1 (X) is uniformly
integrable
P iff (i) supf A |f (x)| < for every x X (ii) for every > 0 there is a finite set I X such that
xX\I |f (x)| for every f A. Show that in this case A is relatively compact for the norm topology of
1
` .

(e) Let (X, , ) be a measure space, > 0, and I a family such that (i) every atom belongs to I
(ii) E I whenever E and E (iii) E F I whenever E, F I and E F = . Show that every
set of finite measure belongs to I.

(f ) Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-preserving function.


Show that a set A L1 () is uniformly integrable iff {g : g A} is uniformly integrable in L1 (). (Hint:
use 246G for if, 246A for only if.)

>(g) Let (X, , ) be a measure space and p [1, [. Let hfn inN be a sequence in LpR = Lp () such
that {|fn |p : n N} is uniformly integrable and fn f a.e. Show that f Lp and limn |fn f |p = 0.

(h) Let (X, , ) be a semi-finite measure space and p [1, [. Let hun inN be a sequence in Lp = Lp ()
and u L0 (). Show that the following are equiveridical: (i) u Lp and hun inN converges to u for k kp
(ii) hun inN converges in measure to u and {|un |p : n N} is uniformly integrable. (Hint: 245Xk.)

(i) Let (X, , ) be a totally finite measure space, and 1 p < r . Let hun inN be a k kr -bounded
sequence in Lr () which converges in measure to u L0 (). Show that hun inN converges to u for k kp .
(Hint: show that {|un |p : n N} is uniformly integrable.)

246Y Further exercises (a) Let (X, , ) be a totally finite measure space. Show that A L1 ()
is uniformly
R integrable iff there is a convex function : [0, [ R such that lima (a)/a = and
supf A (|f |) < .
192 Function spaces 246Yb

(b) For any metric space (Z, ), let CZ be the family of closed subsets of Z, and for F , F 0 CZ \ {}
set (F, F 0 ) = max(supzF inf z0 F 0 (z, z 0 ), supz0 F 0 inf zF (z, z 0 )). Show that is a metric on CZ \ {}
(it is the Hausdorff metric). Show that if (Z, ) is complete then the family KZ \ {} of non-empty
compact subsets of Z is closed for . Now let (X, , ) be any measure space and take Z = L1 = L1 (),
(z, z 0 ) = kz z 0 k1 for z, z 0 Z. Show that the family of non-empty closed uniformly integrable subsets of
L1 is a closed subset of CZ \ {} including KZ \ {}.

(c) Let (X, , ) be a totally finite measure space and A L1 () a uniformly integrable set. Show that
there is a uniformly integrable set C A such that (i) C is convex and closed in L0 () for the topology of
convergence in measure (ii) if u C and |v| |u| then v C (iii) if T belongs to the set T + of operators
from L1 () = M 1, () to itself, as described in 244Xm, then T [C] C.
R
(d) Let be Lebesgue measure on R. Show that a set A L1 () is uniformly integrable iff limn Fn fn
= 0 for every disjoint sequence hFn inN of compact sets in R and every sequence hfn inN in A.
R
(e) Let be Lebesgue measure on R. Show that a set A L1 () is uniformly integrable iff limn Gn fn
= 0 for every disjoint sequence hGn inN of open sets in R and every sequence hfn inN in A.

(f ) Repeat 246Yd and 246Ye for Lebesgue measure on arbitrary subsets of R r .

(g) Let X be a set and a -algebra of subsets of X. Let hn inN be a sequence of countably additive
functionals on such that E = limn n E is defined for every E . Show that limn n Fn = 0
whenever hFn inN is a disjoint sequence in . (Hint: suppose otherwise. By taking suitable subsequences
n i
reduceS to the case in which |n Fi Fi | 2 for i < n, |n Fn | 3, |n Fi | 2 for i > n. Set
F = iN F2i+1 and show that |2n+1 F 2n F | for every n.) Hence show that is countably additive.
(This is the Vitali-Hahn-Saks theorem.)
R
(h) Let (X, , ) be a measure space and hun inN a sequence in L1 = L1 () such that limn F un is
defined for every F . Show that {un : n N} is uniformly integrable. (Hint: suppose not.R Then there are
a disjoint sequence hFn inN in and a subsequence hu0n inN of hun inN such that inf nN | Fn u0n | = > 0.
But this contradicts 246Yg.)
P
(i) In 246Yg, show that is countably additive. (Hint: Set = n=0 an n for a suitable sequence
han inN of strictly positive numbers. For each n choose a Radon-Nikodym derivative fn of n with respect
to . Show that {fn : n N} is uniformly integrable, so that is truly continuous.)
1
(j) Let (X, , ) be any measure
R space, and A L (). Show that the following are equiveridical:
R (i)
A is k k1 -bounded; (ii) supuA | F u| < for every -atom F and lim supn supuA | Fn u| <
R
for every disjoint sequence hFn inN of measurable sets of finite measure; (iii) supuA | E u| < for every
E . (Hint: show that han un inN is uniformly integrable whenever limn an = 0 in R and hun inN is
a sequence in A.)

(k) Let (X, , ) be a measure space and A L1 () a non-empty set. Show that the following are
equiveridical: (i) A is uniformly integrable; (ii) whenever
R B L () is non-empty and downwards-directed

and has infimum 0 in L () then inf vB supuA | uv| = 0. (Hint: for (i)(ii), note that inf vB wv = 0
for every w 0 in L0 . For (ii)(i), use 246G(iv).)
R
(l) Set f (x) = eix for x [, ]. Show that | E f | 2 for every E [, ].

246 Notes and comments I am holding over to the next section the most striking property of uniformly
integrable sets (they are the relatively weakly compact sets in L1 ) because this demands some non-trivial
ideas from functional analysis and general topology. In this section I give the results which can be regarded
as essentially measure-theoretic in inspiration. The most important new concept, or technique, is that of
disjoint-sequence theorem. A typical example is in condition (iii) of 246G, relating uniform integrability
to the behaviour of functionals on disjoint sequences of sets. I give variants of this in 246Yd-246Yf, and
247A Weak compactness in L1 193

246Yg-246Yj are further results in which similar methods can be used. The central result of the next section
(247C) will also use disjoint sequences in the proof, and they will appear more than once in Chapter 35 in
the next volume.
The phrase uniformly integrable ought to mean something like uniformly approximable by simple func-
tions, and the definition 246A can be forced into such a form, but I do not think it very useful to do so.
However condition (ii) of 246G amounts to something like uniformly truly continuous, if we think of mem-
bers of L1 as truly continuous functionals on , as in 242I. (See 246Yi.) Note that in each of the statements
(ii)-(iv) of 246G we need to take special note of any atoms for the measure, since they are not controlled by
the main condition imposed. In an atomless measure space, of course, we have a simplification here, as in
246Yd-246Yf.
Another way of justifying theRuniformly in uniformly integrable is by considering functionals w where
w 0 in L1 , setting w (u) = (|u| w)+ for u L1 ; then A L1 is uniformly integrable iff w 0
uniformly on A as w rises in L1 (246Xa). It is sometimes useful to know that if this is true at all then it is
necessarily witnessed by elements w which can be built directly from materials at hand (see (iii) of 246Xa).
Furthermore, the sets Aw = {u : w (u) } are always convex, k k1 -closed and solid (if u Aw and
|v| |u| then v Aw )(246Cd); they are closed under pointwise convergence of sequences (246Ja) and in
semi-finite measure spaces they are closed for the topology of convergence in measure (246Jd); in probability
spaces, for level w, they are closed under conditional expectations (246D) and similar operators (246Yc).
Consequently we can expect that any uniformly integrable set will be included in a uniformly integrable set
which is closed under operations of several different types.
Yet another uniform property of uniformly integrable sets is in 246Yk. The norm k k is never (in
interesting cases) order-continuous in the way that other k kp are (244Yd); but the uniformly integrable
subsets of L1 provide interesting order-continuous seminorms on L .
246J supplements results
R from 245. In the notesR to that section I mentioned
R the question: if hfn inN f
a.e., in what ways can h fn inN fail to converge to f ? Here we find that h |fn f |inN 0 iff {fn : n N}
is uniformly integrable; this is a way of making precise the expression none of the weight of the sequence is
lost at infinity. Generally, for sequences, convergence in k kp , for p [1, [, is convergence in measure for
pth-power-uniformly-integrable sequences (246Xh).

247 Weak compactness in L1


I now come to the most striking feature of uniform integrability: it provides a description of the relatively
weakly compact subsets of L1 (247C). I have put this into a separate section because it demands some
knowledge of functional analysis in particular, of course, of weak topologies on Banach spaces. I will try
to give an account in terms which are accessible to novices in the theory of normed spaces because the result
is essentially measure-theoretic, as well as being of vital importance to applications in probability theory. I
have written out the essential definitions in 2A3-2A5.

247A Part of the argument of the main theorem below will run more smoothly if I separate out an
idea which is, in effect, a simple special case of a theme which has been running through the exercises of
this chapter (241Ye, 242Yf, 243Ya, 244Yc).
Lemma Let (X, , ) be a measure space, and G any member of . Let G be the subspace measure on
G, so that G E = E for E G, E . Set
U = {u : u L1 (), u G = u} L1 ().
Then we have an isomorphism S between the ordered normed spaces U and L1 (G ), given by writing
S(f ) = (f G)
for every f L1 () such that f U .
proof Of course I should remark explicitly that U is a linear subspace of L1 (). I have discussed integration
over subspaces in 131 and 214; in particular, I noted that f G is integrable, and that
194 Function spaces 247A
R R R
|f G|dG = |f | G d |f |d
1 1
for every f L () (131Fa). If f , g L () and f = g -a.e., then f G = gG G -a.e.; so the proposed
formula for S does indeed define a map from U to L1 (G ).
Because
(f + g)G = (f G) + (gG), (cf )G = c(f G)
1
for all f , g L () and all c R, S is linear. Because
f g -a.e. = f G gG G -a.e.,
R R
S is order-preserving. Because |f G|dG |f |d for every f L1 (), kSuk1 kuk1 for every u U .
To see that S is surjective, take any v L1 (G ). Express v as g where g L1 (G ). By 131E, f L1 (),
where f (x) = g(x) for x dom g, 0 for x X \ G; so that f U and f G = g and v = S(f ) S[U ].
To see that S is norm-preserving, note that, for any f L1 (),
R R
|f G|dG = |f | G d,
so that if u = f U we shall have
R R
kSuk1 = |f G|dG = |f | G d = ku G k1 = kuk1 .

247B Corollary Let (X, , ) be any measure space, and let G be a measurable set expressible as
1
a countable union of sets of finite measure. Define U as in 247A,
R and let h : L () R be any continuous

linear functional. Then there is a v L () such that h(u) = u v d for every u U .
proof Let S : U L1 (G ) be the isomorphism described in 247A. Then S 1 : L1 (G ) U is linear and
continuous, so h1 = hS 1 belongs to the normed space dual (L1 (G )) of L1 (G ). Now of course G is
-finite, therefore localizable (211L), so 243Gb tells us that there is a v1 L (G ) such that
R
h1 (u) = u v1 dG
for every u L1 (G ).
Express v1 as g1 where g1 : G R is a bounded measurable function. Set g(x) = g1 (x) for x G, 0 for
x X \ G; then g : X R is a bounded measurable function, and v = g L (). If u U , express u as
f where f L1 (); then

Z
h(u) = h(S 1 Su) = h1 ((f G) ) = (f G) g1 dG
Z Z Z Z
= (f g)G dG = f g G d = f g d = u v.

As u is arbitrary, this proves the result.

247C Theorem Let (X, , ) be any measure space and A a subset of L1 = L1 (). Then A is uniformly
integrable iff it is relatively compact in L1 for the weak topology of L1 .
proof (a) Suppose that A is relatively compact for the weak topology. I seek to show that it satisfies the
condition (iii) of 246G.
R R
(i) If F , then surely supuA | F u| < , because u 7 F u belongs to (L1 ) , and if h (L1 ) then
the image of any relatively weakly compact set under h must be bounded (2A5Ie).
(ii) Now suppose that hFn inN is a disjoint sequence in . ?? Suppose, if possible, that
R
hsupuA | Fn u|inN
does not converge to 0. Then there is a strictly increasing sequence hn(k)ikN in N such that
1 R
= inf kN supuA | Fn(k) u| > 0.
2
247C Weak compactness in L1 195
R
For each k, choose uk A such that | Fn(k)
uk | . Because A is relatively compact for the weak topology,
there is a cluster point u of huk ikN in L for the weak topology (2A3Ob). Set j = 2j /6 > 0 for each
1

j N.
We can now choose a strictly increasing sequence hk(j)ijN inductively so that, for each j,
R Pj1
Fn(k(j))
(|u| + i=0 |uk(i) |) j
Pj1 R R
i=0 | Fn(k(i)) u Fn(k(i)) uk(j) | j
P1 Pj1 R
P Given hk(i)ii<j , set v = |u|+ i=0 |uk(i) |; then limk Fn(k) v = 0,
for every j, interpreting i=0 as 0. P
by Lebesgues
R Dominated Convergence Theorem or otherwise, so there is a k such that k > k(i) for every
i < j and Fn(k) v j for every k k . Next,

Pj1 R R
w 7 i=0 | Fn(k(i)) u Fn(k(i)) w| : L1 R

is continuous for the weak topology of L1 and zero at u,R and u belongs
R to every weakly open set containing
Pj1
{uk : k k }, so there is a k(j) k such that i=0 | Fn(k(i)) u Fn(k(i)) uk(j) | < j , which continues the
construction. Q Q
R Let v be
R any cluster point in L1 , for the weak topology,
R of huk(j)RijN . Setting Gi = F R n(k(i)) , Rwe have
| Gi u Gi uk(j) | j whenever i < j, so limj Gi uk(j) exists = Gi u for each i, and Gi v = Gi u for
S
every i; setting G = iN Gi ,
R P R P R R
G
v = i=0 Gi v = i=0 Gi u = G u,
by 232D, because hGi iiN is disjoint.
For each j N,

j1 Z
X
X Z
| uk(j) | + | uk(j) |
i=0 Gi i=j+1 Gi

j1 Z
X j1 Z
X Z Z
X
|u| + | u uk(j) | + |uk(j) |
i=0 Gi i=0 Gi Gi i=j+1 Gi

j1
X
X
X
i + j + i = i = .
3
i=0 i=j+1 i=0

R
On the other hand, | Gj
uk(j) | . So
R P R 2
| G
uk(j) | = | i=0 Gi uk(j) | .
3
R
This
R is true2 for every j; because every weakly open set containing v meets {uk(j) : j N}, | G
v| 32
and | G u| 3 . On the other hand,
R P R P R P
| G u| = | i=0 Gi u| i=0 Gi |u| i=0 i = ,
3
which is absurd. X
X R
This contradiction shows that limn supuA | Fn u| = 0. As hFn inN is arbitrary, A satisfies the
condition 246G(iii) and is uniformly integrable.
(b) Now assume that A is uniformly integrable. I seek a weakly compact set C A.
R
(i) For each n N, choose En , Mn 0 such that En < and (|u| Mn En )+ 2n for
every u A. Set
R
C = {v : v L1 , | F v| Mn (F En ) + 2n n N, F },
196 Function spaces 247C

and note that A C, because if u A and F ,


R R R
| F u| F (|u| Mn En )+ + F Mn En 2n + Mn (F En )
for every n. Observe also that C is k k1 -bounded, because
R
kuk1 2 supF | F u| 2(1 + M0 (F E0 )) 2(1 + M0 E0 )
for every u C (using 246F).

(ii) Because I am seeking to prove this theorem for arbitrary measure spaces (X, , ), I cannot use
243G to identify the dual of L1 . Nevertheless, 247B above shows thatR 243Gb it is nearly valid, in the
1
following
S sense: if h (L ) , there1 is a v L such that h(u) = u v for every u C. RP P Set
G = nN En , and define U L as in 247A-247B. By 247B, there is a v L such that h(u) = uv
for every u U . But if u C, we can express u as f where f : X R is measurable. If F and
F G = , then
R R
| F f | = | F u| 2n + Mn (F En ) = 2n
R
for every n N, so F f = 0; it followsRthat f = 0 a.e. on X \ G (131Fc), so that f G =a.e. f and
u = u G , that is, u U , and h(u) = u v, as required. QQ

(iii) So we may proceed, having an adequate description, not of (L1 ()) itself, but of its action on C.
Let F be any ultrafilter on L1 containing C (see 2A3R). For each F , set
R
F = limuF F u;
because
R
supuC | F
u| supuC kuk1 < ,
R R R
this is well-defined in R (2A3Se). If E, F are disjoint members of , then EF u = E u + F u for every
u C, so
R R R
(E F ) = limuF EF u = limuF E u + limuF F u = E + F
(2A3Sf). Thus : R is additive. Next, it is truly continuous with respect to . P
P Given > 0, take
n N such that 2n 21 , set = /2(Mn + 1) > 0 and observe that
R
|F | supuC | F u| 2n + Mn (F En )

whenever
R (F En ) . Q Q By the Radon-Nikodym theorem (232E), there is an f0 L1 such that
f = F for every F . Set u0 = f0 L1 . If n N, F then
F 0
R R
| F u0 | = |F | supuC | F u| 2n + Mn (F En ),
so u0 C.

(iv) OfRcourse the point is that F converges to u0 . P P Let h (L1 ) . Then there is a v L such
that h(u) = u v for every u C. Express v as g , where g : X R is bounded and -measurable. Let

> 0. Take a0 a1 . . . an such that ai+1 ai Pfor each i while a0 g(x) < an for each x X. Set
n
Fi = {x : ai1 g(x) < ai } for 1 i n, and set g = i=1 ai Fi , v = g ; then kv vk . We have

Z n
X Z n
X
u0 v = ai u= ai Fi
i=1 Fi i=1
Xn Z n
X Z Z
= ai lim u = lim ai u = lim u v.
uF Fi uF Fi uF
i=1 i=1

Consequently
247Yb Weak compactness in L1 197

Z Z Z Z Z Z
lim sup | uv u0 v| | u0 v u0 v| + sup | uv u v|
uF uC

ku0 k1 kv vk + sup kuk1 kv vk


uC

2 sup kuk1 .
uC

As is arbitrary,
Z Z
lim sup |h(u) h(u0 )| = lim sup | uv u0 v| = 0.
uF uF

As h is arbitrary, u0 is a limit of F in C for the weak topology of L1 . As F is arbitrary, C is weakly compact


in L1 , and the proof is complete.

247D Corollary Let (X, , ) and (Y, T, ) be any two measure spaces, and T : L1 () L1 () a
continuous linear operator. Then T [A] is a uniformly integrable subset of L1 () whenever A is a uniformly
integrable subset of L1 ().
proof The point is that T is continuous for the respective weak topologies (2A5If). If A L1 () is uniformly
integrable, then there is a weakly compact C A, by 247C; T [C], being the image of a compact set under
a continuous map, must be weakly compact (2A3N(b-ii)); so T [C] and T [A] are uniformly integrable by the
other half of 247C.

247E Complex L1 There are no difficulties, and no surprises, in proving 247C for L1C . If we follow
the same proof, everything works, but of course we must remember to change the constant when applying
246F, or rather 246K, in part (b-i) of the proof.

247X Basic exercises > (a) Let (X, , ) be any measure space. Show that if A L1 = L1 () is
relatively weakly compact, then {v : v L1 , |v| |u| for some u A} is relatively weakly compact.
1 1 0
(b) Let (X, , ) be a Rmeasure
R space.0 On L = LR () define Rpseudometrics F , w for F , w L ()
1
by setting F (u, v) = | F u F v|, w (u, v) = | u w v w| for u, v L . Show that on any
k k1 -bounded subset of L1 , the topology defined by {F : F } agrees with the topology generated by
{0w : w L }.

> (c) Show that for any set X a subset of `1 = `1 (X) is compact for the weak topology of `1 iff it is
compact for the norm topology of `1 . (Hint: 246Xd.)

(d) Use the argument of (a-ii) in the proof of 247C to show directly that if A `1 (N) is weakly compact
then inf nN |un (n)| = 0 for any sequence hun inN in A.

(e) Let (X, , ) and (Y, T, ) be measure spaces, and T : L2 () L1 () any bounded linear operator.
Show that {T u : u L2 (), kuk2 1} is uniformly integrable in L1 (). (Hint: use 244K to see that
{u : kuk2 1} is weakly compact in L2 ().)

247Y Further exercises (a) Let (X, , ) be a measure space. Take 1 < p < and M 0 and set
A = {u : u Lp = Lp (), kukp M }. Write SA for the topology of convergence in measure on A, that is,
the subspace topology induced by the topology of convergence in measure on L0 (). Show that if h (Lp )
then hA is continuous for SA ; so that if T is the weak topology on Lp , then the subspace topology TA is
included in SA .
R
(b) Let (X, , ) be a measure space and hun inN a sequence in L1 = L1 () such that limn F un is
defined for every F . Show that {un : n N} is weakly convergent. (Hint: 246Yh.) Find an alternative
argument relying on 2A5J and the result of 246Yj.
198 Function spaces 247 Notes

247 Notes and comments In 247D and 247Xa I try to suggest the power of the identification between
weak compactness and uniform integrability. That a continuous image of a weakly compact set should be
weakly compact is a commonplace of functional analysis; that the solid hull of a uniformly integrable set
should be uniformly integrable is immediate from the definition. But I see no simple arguments to show
that a continuous image of a uniformly integrable set should be uniformly integrable, or that the solid hull
of a weakly compact set should be relatively weakly compact. (Concerning the former, an alternative route
does exist; see 371Xf in the next volume.)
I can distinguish two important ideas in the proof of 247C. The first, in (a-ii) of the proof, is a careful
manipulation of sequences; it is the argument needed to show that a weakly compact subset of `1 is norm-
compact. (You may find it helpful to write out a solution to 247Xd.) The Fn(k) and uk are chosen to mimic
the situation in which we have a sequence in `1 such that uk (k) = 1 for each k. The k(i) are chosen so that
the hump moves sufficiently rapidly along R for uk(j) (k(i)) to be very small whenever i 6= j.PBut this means
P
that i=0 uk(j) (k(i)) (corresponding to G uk(j) in the proof) is always substantial, while i=0 v(k(i)) will
be small for any putative cluster point v of huk(j) ijN . I used similar techniques in 246; compare 246Yg.
In the other half of the proof of 247C, the strategy is clearer. Members of L1 correspond to truly continuous
functionals on ; the uniform integrability of C makes the corresponding set of functionals uniformly truly
continuous, so that any limit functional will also be truly continuous and will give us a member of L1 via
the Radon-Nikodym R theorem.R A straightforward approximation argument ((a-iv) in the proof, and 247Xb)
shows that limuF u w = v w for every w L . For localizable measures , this would complete the
proof. For the general case, we need another step, here done in 247A-247B; a uniformly integrable subset
of L1 effectively lives on a -finite part of the measure space, so that we can ignore the rest of the measure
and suppose that we have a localizable measure space.
The conditions (ii)-(iv) of 246G make it plain that weak compactness in L1 can be effectively discussed
in terms of sequences; see also 246Yh. I should remark that this is a general feature of weak compactness
in Banach spaces (2A5J). Of course the disjoint-sequence formulations in 246G are characteristic of L1
I mean that while there are similar results applicable elsewhere (see Fremlin 74, chap. 8), the ideas are
clearest and most dramatically expressed in their application to L1 .
251 intro. Finite products 199

Chapter 25
Product Measures
I come now to another chapter on pure measure theory, discussing a fundamental construction or,
as you may prefer to consider it, two constructions, since the problems involved in forming the product
of two arbitrary measure spaces (251) are rather different from those arising in the product of arbitrarily
many probability spaces (254). This work is going to stretch our technique to the utmost, for while the
fundamental theorems to which we are moving are natural aims, the proofs are lengthy and there are many
pitfalls beside the true paths.
RRThe central idea is that of repeated integration. You have probably already seen formulae of the type
f (x, y)dxdy used to calculate the integral of a function of two real variables over a region in the plane.
One
R 1 R of the basic techniques of advanced
R 1 R xcalculus is reversing the order of integration; for instance, we expect
1
( f (x, y)dx)dy to be equal to ( 0 f (x, y)dy)dx. As I have developed the subject, we already have a
0 y 0 R
third calculation to compare with these two: D f , where D = {(x, y) : 0 y x 1} and the integral
is taken with respect to Lebesgue measure on the plane. The first two sections of this chapter are devoted
to an analysis of the relationship between one- and two-dimensional Lebesgue measure which makes these
operations valid some of the time; part of the work has to be devoted to a careful description of the exact
conditions which must be imposed on f and D if we are to be safe.
Repeated integration, in one form or another, appears everywhere in measure theory, and it is therefore
necessary sooner or later to develop the most general possible expression of the idea. The standard method
is through the theory of products of general measure spaces. Given measure spaces (X, , ) and (Y, T, ),
the aim is to find a measure on X Y which will, at least, give the right measure E F to a rectangle
E F where E and F T. It turns out that there are already difficulties in deciding what the
product measure is, and to do the job properly I find I need, even at this stage, to describe two related
but distinguishable constructions. These constructions and their elementary properties take up the whole
of
R 251. InRR252 I turn to integration over the product, with Fubinis and Tonellis theorems relating
f d with f (x, y)(dx)(dy). BecauseRRthe construction of is symmetric
RR between the two factors, this
automatically provides theorems relating f (x, y)(dx)(dy) with f (x, y)(dy)(dx). 253 looks at the
space L1 () and its relationship with L1 () and L1 ().
For general measure spaces, there are obstacles in the way of forming an infinite product; Q to start with,
if h(Xn , n )inN is a sequence of measure spaces, then a product measure on X = nN Xn ought to set
Q
X = n=0 n Xn , and there is no guarantee that this product will converge, or behave well when it does.
But for probability spaces, when n Xn = 1 for every n, this problem at least evaporates. It is possible to
define the product of any family of probability spaces; this is the burden of 254.
I end the chapter with three sections which are a preparation for Chapters 27 and 28, but are also
important in their own right as an investigation of the way in which the group structure of R r interacts with
Lebesgue
R and other measures. 255 deals with the convolution f g of two functions, where (f g)(x) =
f (y)g(x y)dy (the integration being with respect to Lebesgue measure). In 257 I show that some of the
same ideas, suitably transformed, can be used to describe a convolution 1 2 of two measures on R r ; in
preparation for this I include a section on Radon measures on R r (256).

251 Finite products


The first construction to set up is the product of a pair of measure spaces. It turns out that there are
already substantial technical difficulties in the way of finding a canonical universally applicable method. I
find myself therefore describing two related, but distinct, constructions, the primitive and c.l.d. product
measures (251C, 251F). After listing the fundamental properties of the c.l.d product measure (251I-251J), I
work through the identification of the product of Lebesgue measure with itself (251M) and a fairly thorough
discussion of subspaces (251N-251R).
200 Product measures 251A

251A Definition Let (X, , ) and (Y, T, ) be two measure spaces. For A X Y set
P S
(A) = inf{ n=0 En Fn : En , Fn T n N, A nN En Fn }.

Remark In the products En Fn , 0 is to be taken as 0, as in 135.

251B Lemma In the context of 251A, is an outer measure on X Y .


proof (a) Setting En = Fn = for every n N, we see that = 0.
S S
(b) If A B X Y , then whenever B nN En Fn we shall have A nN En Fn ; so A B.
(c) Let hAn inN be a sequence of subsets of X Y , with union A. For any S> 0, we may choose,
for each n N, sequences hEnm imN in and hFnm imN in T such that An mN Enm Fnm and
P n
m=0 Enm Fnm An +2 . Because NN is countable, we have a bijection k 7 (nk , mk ) : N NN,
and now
S S
A n,mN Enm Fnm = kN Enk mk Fnk mk ,
so that

X X
X
A Enk mk Fnk mk = Enm Fnm
k=0 n=0 m=0
X
X
An + 2n = 2 + An .
n=0 n=0
P
As is arbitrary, A n=0 An .
As hAn inN is arbitrary, is an outer measure.

251C Definition Let (X, , ) and (Y, T, ) be measure spaces. By the primitive product measure
on X Y I shall mean the measure 0 derived by Caratheodorys method (113C) from the outer measure
defined in 251A.
Remark I ought to point out that that there is no general agreement on what the product measure on
X Y should be. Indeed in 251F below I will introduce an alternative one, and in the notes to this section
I will mention a third.

251D Definition It is convenient to have a name for a natural construction for -algebras. If X and
b for the -algebra of subsets of X Y
Y are sets with -algebras PX and T PY , I will write T
generated by {E F : E , F T}.

251E Proposition Let (X, , ) and (Y, T, ) be measure spaces; let 0 be the primitive product
b and 0 (E F ) = E F for all E , F T.
measure on X Y , and its domain. Then T
proof Throughout this proof, write f = {E : E , E < }, Tf = {F : F T, F < }.
(a) Suppose thatSE and A X P Y . For any > 0, there are sequences hEn inN in and hFn inN

in T such that A nN En Fn and n=0 En En A + . Now
S S
A (E Y ) nN (En E) Fn , A \ (E Y ) nN (En \ E) Fn ,
so

X
X
(A (E Y )) + (A \ (E Y )) (En E) Fn + (En \ E) Fn
n=0 n=0
X
= En Fn A + .
n=0
251G Finite products 201

As is arbitrary, (A (E Y )) + (A \ (E Y )) A. And this is enough to ensure that E Y (see


113D).
(b) Similarly, X F for every F T, so E F = (E Y ) (X F ) for every E , F T.
Because is a -algebra, it must include the smallest -algebra containing all the products E F , that
b
is, T.
(c) Take E , F T. We know that E F ; setting E0 = E, F0 = F , En = Fn = for n 1 in
the definition of , we have
0 (E F ) = (E F ) E F .
We haveS come to the central idea of the construction. In fact (E F )P=E F . P
P Suppose that
E F nN En Fn where En and Fn T for every n. Set u = n=0 En Fn . If u = or
E = 0 or F = 0 then of course E F u. Otherwise, set
I = {n : n N, En = 0}, J = {n : n N, Fn = 0}, K = N \ (I J),
S S
E 0 = E \ nI En , F 0 = F \ nJ Fn .
S
Then E 0 = E and F 0 = F ; E 0 F 0 nK En Fn ; and for n K, En < and Fn < , since
En Fn u < and neither En nor Fn is zero. Set
fn = Fn En : X R
R
if n K, and fn = 0 : X R if n I J. Then fn is a simple function and fn = Fn En for n K, 0
otherwise, so
P R P
n=0 fn (x)(dx) = n=0 En Fn u.
Pn P R
By B.Levis theorem (123A), applied to h k=0 fk inN , g = n=0 fn is integrable and g d u. Write
E 00 for {x : x E 0 , g(x) < }, S
so that E 00 = E 0 =SE. Now take any x E 00 and set Kx = {n : n
K, x En }. Because E F nK En Fn , F 0 nKx Fn and
0 0

P P
F = F 0 nKx Fn = n=0 fn (x) = g(x).
Thus g(x) F for every x E 00 . We are supposing that 0 < E = E 00 and 0 < F , so we must have
F < , E 00 < . Now g F E 00 , so
R R P
E F = E 00 F = F E 00 g u = n=0 En Fn .
As hEn inN , hFn inN are arbitrary, (E F ) E F and (E F ) = E F . Q
Q
Thus
0 (E F ) = (E F ) = E F
for all E , F T.

251F Definition Let (X, , ) and (Y, T, ) be measure spaces, and 0 the primitive product measure
defined in 251C. By the c.l.d. product measure on X Y I shall mean the function : dom 0 [0, ]
defined by setting
W = sup{0 (W (E F )) : E , F T, E < , F < }
for W dom 0 .

251G Remark I had better show at once that is a measure. P P Of course its domain = dom 0 is
a -algebra, and = 0 = 0. If hWn inN is a disjoint sequence in , then for any E , F T of finite
measure
S P P
0 ( nN Wn (E F )) = n=0 0 (Wn (E F )) n=0 Wn ,
S P P
so ( nN Wn ) n=0 W n . On the other hand, if a < n=0 Wn , then we can find m N and
Pm
a0 , . . . , am such that a n=0 an and an < Wn for each n m; now there are E0 , . . . , Em and
202 Product measures 251G

S
F0 , . . . , Fm T, all of finite measure, such that an 0 (Wn (En Fn )) for each n. Setting E = En
S nm
and F = nm Fn , we have E < and F < , so

[ [
X
( W n ) 0 ( Wn (E F )) = 0 (Wn (E F ))
nN nN n=0
m
X m
X
0 (Wn (En Fn )) an a.
n=0 n=0
S P S P
As a is arbitrary, ( nN Wn ) n=0 Wn and ( nN Wn ) = n=0 Wn . As hWn inN is arbitrary, is
a measure. QQ

251H We need a simple property of the measure 0 .


Lemma Let (X, , ) and (Y, T, ) be two measure spaces; let 0 be the primitive product measure on
X Y , and its domain. If H X Y and H (E F ) whenever E < and F < , then
H .
proof Let be the outer measure described in 251A. Suppose that AS X Y and A <P . Let > 0. Let

hEn inN , hFn inN be sequences in , T respectively such that A nN En Fn and n=0 En Fn
A + . Now, for each n, the product of the measures En , En is finite, so either one is zero or both are
finite. If En = 0 or Fn = 0 then of course
En Fn = 0 = ((En Fn ) H) + ((En Fn ) \ H).
If En < and Fn < then

En Fn = 0 (En Fn )
= 0 ((En Fn ) H) + 0 ((En Fn ) \ H)
= ((En Fn ) H) + ((En Fn ) \ H).

Accordingly, because is an outer measure,



X
X
(A H) + (A \ H) ((En Fn ) H) + ((En Fn ) \ H)
n=0 n=0
X
= En Fn A + .
n=0

As is arbitrary, (A H) + (A \ H) A. As A is arbitrary, H .

251I Now for the fundamental properties of the c.l.d. product measure.
Theorem Let (X, , ) and (Y, T, ) be measure spaces; let be the c.l.d. product measure on X Y , and
its domain. Then
(a) Tb and (E F ) = E F whenever E , F T and E F < ;
(b) for every W there is a V Tb such that V W and V = W ;
(c) (X Y, , ) is complete and locally determined, and in fact is the c.l.d. version of (X Y, , 0 ) as
described in 213D-213E; in particular, W = 0 W whenever 0 W < ;
(d) if W and W > 0 then there are E , F T such that E < , F < and (W (E F )) >
0;
(e) if W and W <S, then for every > 0 there are E0 , . . . , En , F0 , . . . , Fn T, all of finite
measure, such that (W 4 in (Ei Fi )) .
proof Take to be the outer measure of 251A and 0 the primitive product measure of 251C. Set f =
{E : E , E < } and Tf = {F : F T, F }.
251I Finite products 203

(a) By 251E, T b . If E and F T and E F < , either E F = 0 and (E F ) =


0 (E F ) = 0 or both E and F are finite and again (E F ) = 0 (E F ) = E F .
(b)(i) Take any a < W . Then there are E f , F Tf such that 0 (W (E F )) > a (251H); now

((E F ) \ W ) = 0 ((E F ) \ W )
= 0 (E F ) 0 (W (E F )) < 0 (E F ) a.
S
Let hEn inN , hFn inN be sequences in , T respectively such that (E F ) \ W En Fn and
P nN
n=0 En Fn 0 (E F ) a. Consider
S
V = (E F ) \ nN En Fn T;b
then V W , and

V = 0 V = 0 (E F ) 0 ((E F ) \ V )
[
0 (E F ) 0 ( E n Fn )
nN
S
(because (E F ) \ V nN En Fn )

X
0 (E F ) En Fn a
n=0

(by the choice of the En , Fn ).


b such that V W and V a. Now choose a sequence
(ii) Thus for every a < W there is a V T S
han inN strictly increasing to W , and for each an a corresponding Vn ; then V = nN Vn belongs to the
b is included in W , and has measure at least supnN Vn and at most W ; so V = W , as
-algebra T,
required.
(c)(i) If H X Y is -negligible, there is a W such that H W and W = 0. If E , F T are
of finite measure, 0 (W (E F )) = 0; but 0 , being derived from the outer measure by Caratheodorys
method, is complete (212A), so H (E F ) and 0 (H (E F )) = 0. Because E and F are arbitrary,
H , by 251H. As H is arbitrary, is complete.
(ii) If W and W = , then there must be E , F T such that E < , F < and
0 (W (E F )) > 0; now
0 < (W (E F )) E F < .
Thus is semi-finite.
(iii) If H X Y and H W whenever W < , then, in particular, H (E F ) whenever
E < and F < ; by 251H again, H . Thus is locally determined.
(iv)
S If W and P 0 W < , then we have sequences hEn inN in , hFn inN in T such that

W nN (E n Fn ) and n=0 En Fn < . Set

I = {n : En = }, J = {n : Fn = }, K = N \ (I J);
S S
then ( nI Fn ) = ( nJ En ) = 0, so 0 (W \ W 0 ) = 0, where
S S S
W 0 = W nK (En Fn ) W \ (( nJ En Y ) (X nI Fn )).
S S S
Now set En0 = iK,in Ei , Fn0 = iK,in Fi for each n. We have W 0 = nN W 0 (En0 Fn0 ), so
W 0 W = 0 W 0 = limn 0 (W 0 (En0 Fn0 )) W 0 W ,
and W = 0 W .
(v) Following the terminology of 213D, let us write
= {W : W X Y, W V whenever V , 0 V < },
204 Product measures 251I

W = sup{0 (W V ) : V , 0 V < }.
Because 0 (E F ) < whenever E < and F < , and = .
Now for any W we have

W = sup{0 (W V ) : V , 0 V < }
sup{0 (W (E F )) : E f , F Tf }
= W
sup{(W V ) : V , 0 V < }
= sup{0 (W V ) : V , 0 V < },

using (iv) just above, so that = is the c.l.d. version of 0 .


(d) If W and W > 0, there are E f and F Tf such that (W (EF )) = 0 (W (EF )) > 0.
(e) There are E f , F Tf such that 0 (W (E F )) W 13 ; set V1 = W (E F ); then
1
(W \ V1 ) = W V1 = W 0 V1 .
3
S P
There are sequences hEn inN in , hFn inN in T such that V1 nN En0 Fn0 and n=0 En0 Fn0
0 0
f
0 V1 + 31 . Replacing 0 0 0 0 0 f 0
S En , 0Fn by0 En E, Fn F if necessary, we may suppose that En , Fn T for
every n. Set V2 = nN En Fn ; then
P 1
(V2 \ V1 ) 0 (V2 \ V1 ) n=0 En0 Fn0 0 V1 .
3
P
Let m N be such that n=m+1 En0 Fn0 31 , and set
Sm
V = n=0 En0 Fn0 .
Then
P 1
(V2 \ V ) n=m+1 En0 Fn0 .
3
Putting these together, we have W 4V (W \ V1 ) (V2 \ V1 ) (V2 \ V ), so
1 1 1
(W 4V ) (W \ V1 ) + (V2 \ V1 ) + (V2 \ V ) + + = .
3 3 3
And V is of the required form.

251J Proposition If (X, , ) and (Y, T, ) are semi-finite measure spaces and is the c.l.d. product
measure on X Y , then (E F ) = E F for all E , F T.
proof Setting f = {E : E , E < }, Tf = {F : F T, F < }, we have

(E F ) = sup{0 ((E E0 ) (F F0 )) : E0 f , F0 Tf }
= sup{(E E0 ) (F F0 )) : E0 f , F0 Tf }
= sup{(E E0 ) : E0 f } sup{(F F0 ) : F0 Tf }
= E F
(using 213A).

251K -finite spaces Of course most of the measure spaces we shall apply these results to are -finite,
and in this case there are some useful simplifications.
Proposition Let (X, , ) and (Y, T, ) be -finite measure spaces. Then the c.l.d. product measure on
b moreover,
X Y is equal to the primitive product measure, and is the completion of its restriction to T;
this common product measure is -finite.
251L Finite products 205

proof Write 0 , for the primitive and c.l.d. product measures, as usual, and for their domain. Let
hEn inN , hFn inN be non-decreasing sequences of sets of finite measure covering X, Y respectively (see
211D).
S
(a) For each n N, (En Fn ) = En Fn is finite, by 251Ia. Since X Y = nN En Fn , is
-finite.
(b) For any W ,
0 W = limn 0 (W (En Fn )) = limn (W (En Fn )) = W .
So = 0 .
b and B for its completion.
(c) Write B for the restriction of = 0 to T,
b such that W 0 W W 00 and
(i) Suppose that W dom B . Then there are W 0 , W 00 T
00 0 00 0
B (W \ W ) = 0 (212C). In this case, (W \ W ) = 0; as is complete, W and
W = W 0 = B W 0 = B W .
Thus extends B .
(ii) If W , then there is a V Tb such that V W and (W \ V ) = 0. P P For each n N
b such that Vn W (En Fn ) and Vn = (W (En Fn )) (251Ib). But as
there is a Vn T S
(En Fn ) = En Fn is finite, this means that (W (En Fn ) \ Vn ) = 0. So if we set V = nN Vn ,
b V W and
we shall have V T,
S S
W \ V = nN W (En Fn ) \ V nN W (En Fn ) \ Vn
is -negligible. Q
Q
b such that V 0 (X Y ) \ W and (((X Y ) \ W ) \ V 0 ) = 0. Setting
Similarly, there is a V 0 T
00 0 00 b
V = (X Y ) \ V , V T, W V 00 and (V 00 \ W ) = 0. So
B (V 00 \ V ) = (V 00 \ V ) = (V 00 \ W ) + (W \ V ) = 0,
and W is measured by B , with B W = B V = W . As W is arbitrary, B = .

251L It is time that I gave some examples. Of course the central example is Lebesgue measure. In
b
this case we have the only reasonable result. I pause to describe the leading example of the product T
introduced in 251D.
Proposition Let r, s 1 be integers. Then we have a natural bijection : R r R s R r+s , defined by
setting
((1 , . . . , r ), (1 , . . . , s )) = (1 , . . . , r , 1 , . . . , s )
for 1 , . . . , r , 1 , . . . , s R. If we write Br , Bs and Br+s for the Borel -algebras of R r , R s and R r+s
respectively, then identifies Br+s with Br B b s.

proof (a) Write B for the -algebra {1 [W ] : W Br+s } copied onto R r R s by the bijection ; we
are seeking to prove that B = Br B b s . We have maps 1 : R r+s R r , 2 : R r+s R s defined by
setting 1 ((x, y)) = x, 2 ((x, y)) = y. Each co-ordinate of 1 is continuous, therefore Borel measurable
(121Db), so 11 [E] Br+s for every E Br , by 121K. Similarly, 21 [F ] Br+s for every F Bs . So
[E F ] = 11 [E] 11 [F ] belongs to Br+s , that is, E F B, whenever E Br and F Bs . Because B
b s B.
is a -algebra, Br B
(b) Now examine sets of the form
{(x, y) : i } = {x : i } R s ,

{(x, y) : j } = R r {y : j }
b s . But
for R, i r and j s, taking x = (1 , . . . , r ) and y = (1 , . . . , s ). All of these belong to Br B
b
the -algebra they generate is just B, by 121J. So B Br Bs and B = Br Bs . b
206 Product measures 251M

251M Theorem Let r, s 1 be integers. Then the bijection : R r R s R r+s described in 251L
identifies Lebesgue measure on R r+s with the c.l.d. product of Lebesgue measure on R r and Lebesgue
measure on R s .

proof Write r , s , r+s for the three versions of Lebesgue measure, r , s and r+s for the corresponding
outer measures, and for the outer measure on R r R s derived from r and s by the formula of 251A.

(a) If I R r and J R s are half-open intervals, then [I J] R r+s is also a half-open interval, and
r+s ([I J]) = r I s J;
this is immediate from the definition
Qrof the Lebesgue measure of an interval. (I speak of half-open intervals
here, that is, intervals of the form j=1 [j , j [, because I used them in the definition of Lebesgue measure
in 115. If you prefer to work with open intervals or closed intervals it makes no difference.) Note also that
every half-open interval in R r+s is expressible as [I J] for suitable I, J.

(b) For any A R r+s , (1S[A]) r+s (A).


PP P For any > 0, there is a sequence hKn inN of half-open
intervals in Rr+s such that A nN Kn and n=0 r+s (Kn ) r+s (A)+. Express each Kn as [In Jn ],
S
where In and Jn are half-open intervals in R r and R s respectively; then 1 [A] nN In Jn , so that
P P
(1 [A]) n=0 r In s Jn = n=0 r+s (Kn ) r+s (A) + .
As is arbitrary, we have the result. Q
Q

(c) If E R r and F R s are measurable, then r+s ([E F ]) r E s F . P


P (i) Consider first the
case r E < , s F < S . In this case,
S given > 0, there are sequences hI n inN , hJn inN of half-open
intervals such that E nN In , F nN Fn ,
P
n=0 r In r E + = r E + ,
P
n=0 s Jn s F + = s F + .
S S
Accordingly E F m,nN Im Jn and [E F ] m,nN [Im Jn ], so that


X
X
r+s ([E F ]) r+s ([Im Jn ]) = r Im s Jn
m,n=0 m,n=0
X
X
= r Im s Jn (r E + )(s F + ).
m=0 n=0

As is arbitrary, we have the result. (ii) Next, if r E = 0, there is a sequence hFn inN of sets of finite
measure covering R s F , so that
P P
r+s ([E F ]) n=0 r+s ([E Fn ]) n=0 r E s Fn = 0 = r E s F .
(iii) Similarly, r+s ([E F ]) r E s F if s F = 0. (iv) The only remaining case is in which both of r E,
s F are strictly positive and one is infinite; but in this case r E s F = , so surely r+s ([E F ])
r E s F . Q
Q

(d) If A R r+s , then r+s (A) (1 [A]). PP Given > S0, there are sequences
PhEn inN , hFn inN
of measurable sets in R r , R s respectively such that 1 [A] nN En Fn and n=0 r En s Fn
S
(1 [A]) + . Now A nN [En Fn ], so
P P
r+s (A) n=0 r+s ([En Fn ]) n=0 r En s Fn (1 [A]) + .
As is arbitrary, we have the result. Q
Q

(e) Putting (c) and (d) together, we have (1 [A]) = r+s (A) for every A Rr+s . Thus on R r R s
corresponds exactly to r+s on R r+s . So the associated measures 0 , r+s must correspond in the same
way, writing 0 for the primitive product measure. But 251K tells us that 0 = , so we have the result.
251O Finite products 207

251N In fact, a large proportion of the applications of the constructions here are to subspaces of
Euclidean space, rather than to the whole product R r R s . It would not have been especially difficult
to write 251M out to deal with arbitrary subspaces, but I prefer to give a more general description of the
product of subspace measures, as I feel that it illuminates the method. I start with a straightforward result
on strictly localizable spaces.
Proposition Let (X, , ) and (Y, T, ) be strictly localizable measure spaces. Then the c.l.d. product
measure on X Y is strictly localizable; moreover, if hXi iiI and hYj ijJ are decompositions of X and Y
respectively, hXi Yj i(i,j)IJ is a decomposition of X Y .
proof Let hXi iiI and hYj ijJ be decompositions of X, Y respectively. Then hXi Yj i(i,j)IJ is a disjoint
cover of X Y by measurable sets of finite measure. If W X Y and W > 0, there P are sets E ,
F TP such that E < , F < and (W (E F )) > 0. We know that E = iI (E Xi ) and
F = jJ (F Yj ), so there must be finite sets I0 I, J0 J such that
P P
E F ( iI0 (E Xi ))( jJ0 (F Yj )) < (W (E F )).
S S
Setting E 0 = iI0 Xi , F 0 = jJ0 Yj we have
((E F ) \ (E 0 F 0 )) = (E F ) ((E E 0 ) (F F 0 )) < (W (E F )),
so that (W (E 0 F 0 )) > 0. There must therefore be some i I0 , j J0 such that (W (Xi Yj )) > 0.
This shows that {Xi Yj : i I, j J} satisfies the criterion of 213O, so that , being complete and
locally determined, must be strictly localizable. Because hXi Yj i(i,j)IJ covers X Y , it is actually a
decomposition of X Y (213Ob).

251O Lemma Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y .
Let be the corresponding outer measure (132B). Then
C = sup{(C (E F )) : E , F T, E < , F < }
for every C X Y , where is the outer measure of 251A.
proof Write for the domain of , f for {E : E , E < }, Tf for {F : F T, F < }; set
u = sup{(C (E F )) : E f , F Tf }.
(a) If C W , E f and F Tf , then

(C (E F )) (W (E F )) = 0 (W (E F ))
(where 0 is the primitive product measure)
W.

As E and F are arbitrary, u W ; as W is arbitrary, u C.


(b) If u = , then of course C = u. Otherwise, let hEn inN , hFn inN be sequences in f , Tf
respectively such that
u = supnN (C (En Fn )).
S S
Consider C = C \ ( nN En nN Fn ). If E f and F Tf , then for every n N we have
0

u (C ((E En ) (F Fn )))
= (C ((E En ) (F Fn )) (En Fn ))
+ (C ((E En ) (F Fn )) \ (En Fn ))
(because En Fn , by 251E)
(C (En Fn )) + (C 0 (E F )).

Taking the supremum of the right-hand expression as n varies, we have u u + (C 0 (E F )) so


208 Product measures 251O

(C 0 (E F )) = (C 0 (E F )) = 0.
As E and F are arbitrary, C 0 = 0.
But this means that

[ [
C (C ( En Fn )) + C 0
nN nN
[ [

= lim (C ( Ei Fi ))
n
in in
(using 132Ae)
u,

as required.

251P Proposition Let (X, , ) and (Y, T, ) be measure spaces, and A X, B Y subsets; write
A , B for the subspace measures on A, B respectively. Let be the c.l.d. product measure on X Y , and
# the subspace measure it induces on A B. Let be the c.l.d. product measure of A and B on A B.
Then
(i) extends # .
(ii) If
either () A and B T
or () A and B can both be covered by sequences of sets of finite measure
or () and are both strictly localizable,
then = # .
proof Let be the outer measure on X Y defined from and by the formula of 251A, and the outer
measure on A B similarly defined from A and B . Write for the domain of , for the domain of ,
and # = {W (A B) : W } for the domain of # . Set f = {E : E < }, Tf = {F : F < }.
(a) The first point to observe is that C =S
C for every C A B. P
P (i) If hEn inN and hFn inN are
sequences in , T respectively such that C nN En Fn , then
S
C = C (A B) nN (En A) (Fn B),
so

X
C A (En A) B (Fn B)
n=0
X
X
= (En A) (Fn B) En Fn .
n=0 n=0

As hEn inN and hFn inN are arbitrary, C C. (ii) If hEn inN , hFn inN are sequences in A = dom A ,
S
TB = dom B respectively such that C nN En Fn , then for each n N we can choose En , Fn T
such that
En En , En = En = A En ,

Fn Fn , Fn = Fn = B Fn ,
and now
P P
C n=0 En Fn = n=0 A En B Fn .
As hEn inN , hFn inN are arbitrary, C C. Q
Q
(b) It follows that # . P
P Suppose that V # and that C A B. In this case there is a W
such that V = W (A B). So
251Q Finite products 209

(C V ) + (C \ V ) = (C W ) + (C \ W ) = C = C.
As C is arbitrary, V . Q
Q
Accordingly, for V # ,

# V = V = sup{(V (E F )) : E f , F Tf }
= sup{(V (E F )) : E A , F TB , A E < , B F < }
= sup{(V (E F )) : E A , F TB , A E < , B F < } = V,
using 251O twice.
This proves part (i) of the proposition.
(c) The next thing to observe is that if V and V E F where E f and F Tf , then V # .
P
P Let W E F be a measurable envelope of V with respect to (132Ee). Then

(W (A B) \ V ) = (W (A B) \ V ) = (W (A B) \ V )
(because W (A B) # , V )
= (W (A B)) V (W (A B)) V
(because V (E A) (F B), and A (E A) and B (F B) are both finite, so V = V = V )
= (W (A B)) V W V = 0.
But this means that W 0 = W (A B) \ V and V = (A B) (W \ W 0 ) belongs to # . Q
Q
(d) Now fix any V , and look at the conditions ()-() of part (ii) of the proposition.
) If A and B T, and C X Y , then A B (251E), so
(

(C V ) + (C \ V ) = (C V ) + ((C \ V ) (A B)) + ((C \ V ) \ (A B))


= (C V ) + (C (A B) \ V ) + (C \ (A B))
= (C (A B)) + (C \ (A B))
= (C (A B)) + (C \ (A B)) = C.
As C is arbitrary, V , so V = V (A B) # .
S S S
) If A nN En and B nN Fn where all the En , Fn are of finite measure, then V = m,nN V
(
(Em Fn ) # , by (c).
( ) If hXi iiI , hYj ijJ are decompositions of X, Y respectively, then for each i I, j J we have
V (Xi Yj ) # , that is, there is a Wij such that V (Xi Yj ) = Wij (AB). Now hXi Yj i(i,j)IJ
is a decomposition of X Y for (251N), so that
S
W = iI,jJ Wij (Xi Yj ) ,
and V = W (A B) # .
(e) Thus any of the three conditions is sufficient to ensure that = # , in which case (i) tells us that
= # .

251Q Corollary Let r, s 1 be integers, and : Rr R s R r+s the natural bijection. If A R r and
B R s , then the restriction of to A B identifies the product of Lebesgue measure on A and Lebesgue
measure on B with Lebesgue measure on [A B] R r+s .
Remark Note that by Lebesgue measure on A I mean the subspace measure rA on A induced by r-
dimensional Lebesgue measure r on R r , whether or not A is itself a measurable set.
proof By 251P, using either of the conditions (ii-) or (ii-), the product measure on A B is just the
subspace measure # on A B induced by the product measure on R r R s . But by 251M we know that
is an isomorphism between (R r Rs , ) and (R r+s , r+s ); so it must also identify with the subspace
measure on [A B].
210 Product measures 251R

251R Corollary Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on
XY . If A X and B Y can be covered by sequences of sets of finite measure, then (AB) = A B.
proof In the language of 251P,

(A B) = # (A B) = A A B B
(by 251K and 251E)
= A B.

251S The next proposition gives an idea of how the technical definitions here fit together.
Proposition Let (X, , ) and (Y, T, ) be measure spaces. Write (X, , ) and (X, , ) for the completion
and c.l.d. version of (X, , ) (212C, 213E). Let , and be the three c.l.d. product measures on X Y
obtained from the pairs (, ), (, ) and (, ) of factor measures. Then = = .
proof Write , and for the domains of , , respectively; and , , for the outer measures on
X Y obtained by the formula of 251A from the three pairs of factor measures.
(a) If E and E < , then , and agree on subsets of E Y . P
P Take A E Y and > 0.
S P
(i) There are sequences hEn inN in , hFn inN in T such that A nN En Fn and n=0 En Fn
A + . Now En En for every n (213F), so
P P
A n=0 En Fn n=0 En Fn A + .
S P
(ii) There are sequences hEn inN in , hFn inN in T such that A nN En Fn and n=0 En Fn
A + . Now for each n there is an En0 such that En En0 and En0 = En , so that
P P
A n=0 En0 Fn = n=0 En Fn A + .
S P
(iii) There are sequences hEn inN in , hFn inN in T such that A nN En Fn and n=0 En
Fn A + . Now for each n, En E , so
P P
A n=0 (En E) Fn n=0 En Fn A + .

(iv) Since A and are arbitrary, = = on P(E Y ). Q


Q
(b) Consequently, the outer measures , and are identical. PP Use 251O. Take A X Y , E ,
E , E , F T such that E, E, E and F are all finite. Then
(i)
(A (E F )) = (A (E F )) A, (A (E F )) = (A (E F )) A
because E and E are both finite.
(ii) There is an E 0 such that E E 0 and E 0 < , so that
(A (E F )) (A (E 0 F )) = (A (E 0 F )) A.

(iii) There is an E 00 such that E 00 E and (E \ E 00 ) = 0 (213Fc), so that ((E \ E 00 ) Y ) = 0


and E 00 < ; accordingly
(A (E F )) = (A (E 00 F )) = (A (E 00 F )) A.

(iv) Taking the supremum over E, E, E and F , we get


A A, A A, A A, A A.
As A is arbitrary, = = . Q
Q
(c) Now , and are all complete and locally determined, so by 213C are the measures defined by
Caratheodorys method from their own outer measures, and are therefore identical.
251Wf Finite products 211

251T It is obvious and an easy consequence of theorems so far proved, that the set {(x, x) : x R}
is negligible for Lebesgue measure on R 2 . The corresponding result is true in the square of any atomless
measure space.
Proposition Let (X, , ) be an atomless measure space, and let be the c.l.d. measure on X X. Then
= {(x, x) : x X} is -negligible.
proof Let E, F be sets of finite measure, and n N. Applying 215D repeatedly, we can find a disjoint
F S
family hFi ii<n of measurable subsets of F such that Fi = for each i; setting Fn = F \ i<n Fi , we
n+1
F
also have Fn = . Now
n+1
S
(E F ) in (E Fi ) Fi ,
so
Pn F Pn 1
( (E F )) i=0 (E Fi ) Fi = i=0 (E Fi ) E F .
n+1 n+1
As n is arbitrary, ( (E F )) = 0; as E and F are arbitrary, = 0.

*251W Products of more than two spaces The whole of this section can be repeated for arbitrary
finite products. The labour is substantial but no new ideas are required. By the time we need the general
construction in any formal way, it should come very naturally, and I do not think it is necessary to work
through the next couple of pages before proceeding, especially as products of probability spaces are dealt
with in 254. However, for completeness, and to help locate results when applications do appear, I list them
here. They do of course constitute a very instructive set of exercises. The most important fragments are
repeated in 251Xd-251Xe.
Q
Let h(Xi , i , i )iiI be a finite family of measure spaces, and set X = iI Xi . Write fi = {E : E
i , i E < } for each i I.

(a) For A X set


Y
X [Y
(A) = inf{ i Eni : Eni i i I, n N, A Eni }.
n=0 iI nN iI

Then is an outer measure on X. Let 0 be the measure on X derived by Caratheodorys method from ,
and its domain.
N
c
(b) If hXi iiI is a finite family of Q
sets and i is a -algebra
Q of subsets of Xi for each i I, then iI i
is the -algebra of subsets of X = iI Xi generated by { iI Ei : Ei i for every i I}. (For the
corresponding construction when I is infinite, see 254E.)
Q Q
(c) 0 ( iI Ei ) is defined and equal to iI i Ei whenever Ei i for each i I.

(d) The c.l.d. product measure on X is the measure defined by setting


Q
W = sup{0 (W iI Ei ) : Ei fi for each i I}
for W .
Q
(e) If H X, then H iff H iI Ei whenever Ei fi for each i I.
N Q Q
(f ) (i) c iI i and ( iI Ei ) = iI i Ei whenever Ei fi for each i.
N
(ii) For every W there is a V c i such that V W and V = W .
iI
(iii) is complete and locally determined, and is the c.l.d. version of 0 .
Q
(iv) If W and W > 0 then there are Ei fi , for i I, such that (W iI Ei ) > 0.
f
(v) If W Sand W Q < , then for every > 0 there are n N and E0i , . . . , Eni i , for each i I,
such that (W 4 kn iI Eki ) .
212 Product measures 251Wg

N
(g) If each i is -finite, so is , and = 0 is the completion of its restriction to c iI i .
(h) If hIj ijJ is any partition of I, then can be identified with the c.l.d. product of hj ijJ , where j
is the c.l.d. product of hi iiIj . (See the arguments in 251M and also in 254N below.)
(i) If I = {1, . . . , n} and each i is Lebesgue measure on R, then can be identified with Lebesgue
measure on R n .
Q
(j) If, for each i I, we have a decomposition hXij ijJi of Xi , then h iI Xi,f (i) if QiI Ji is a decompo-
sition of X.
(k) For any A X,
Q
C = sup{(C iI Ei ) : Ei fi for every i I}.
Q
(l) Suppose that Ai Xi for each i I. Write # for the subspace measure on A = iI Ai , and for
the c.l.d. product of the subspace measures on the Ai . Then extends # , and if
either Ai i for every i
or every Ai can be covered by a sequence of sets of finite measure
or every i is strictly localizable,
then = # .

Q
Q (m) If Ai Xi can be covered by a sequence of sets of finite measure for each i I, then ( iI Ai ) =
iI i Ai for each i.

(n) Writing i , i for the completion and c.l.d. version of each i , is the c.l.d. product of hi iiI and
also of hi iiI .
(o) If all the (Xi , i , i ) are the same atomless measure space, then {x : x X, i 7 x(i) is injective} is
-conegligible.

251X Basic exercises (a) Let (X, , ) and (Y, T, ) be measure spaces; let 0 be the primitive product
measure on X Y , and the c.l.d. product measure. Show that 0 W < iff W < and W is included
in a set of the form
S
(E Y ) (X F ) nN En Fn
where E = F = 0 and En < , Fn < for every n.
> (b) Show that if X and Y are any sets, with their respective counting measures, then the primitive and
c.l.d. product measures on X Y are both counting measure on X Y .
(c) Let (X, , ) and (Y, T, ) be measure spaces; let 0 be the primitive product measure on X Y , and
the c.l.d. product measure. Show that

0 is locally determined
0 is semi-finite
0 =
0 , have the same negligible sets.

> (d) (See


Q 251W.) Let h(Xi , i , i )iiI be a family of measure spaces, where I is a non-empty finite set.
Set X = iI Xi . For A X, set
P Q S Q
(A) = inf{ n=0 iI i Eni : Eni i n N, i I, A nN iI Eni }.
Show that is an outer measure on X. Let 0 be the measure defined from by Caratheodorys method,
and for W dom 0 set
Q
W = sup{0 (W iI Ei ) : Ei i , i Ei < for every i I}.
Show that is a measure on X, and is the c.l.d. version of 0 .
251Xr Finite products 213

> (e) (See 251W.) Let I be a non-empty finite set and h(Xi , i , i )iiI a family of measure spaces. For
Q (K)
non-empty K I set X (K) = iK Xi and let 0 , (K) be the measures on X (K) constructed as in
251Xd. Show that if K is a non-empty proper subset of I, then the natural bijection between X (I) and
(I) (K) (I\K)
X (K) X (I\K) identifies 0 with the primitive product measure of 0 and 0 , and (I) with the
(K) (I\K)
c.l.d. product measure of and .

> (f ) Using 251Xd-251Xe above, or otherwise, show that if (X1 , 1 , 1 ), (X2 , 2 , 2 ), (X3 , 3 , 3 ) are
measure spaces then the primitive and c.l.d. product measures 0 , of (X1 X2 ) X3 , constructed by first
taking the appropriate product measure on X1 X2 and then taking the product of this with the measure
of X3 , are identified with the corresponding product measures on X1 (X2 X3 ) by the canonical bijection
between the sets (X1 X2 ) X3 and X1 (X2 X3 ).

(g) (i) What happens in 251Xd when I is a singleton? (ii) Devise an appropriate convention to make
251Xd-251Xe remain valid when one or more of the sets I, K, I \ K there is empty.

> (h) Let (X, , ) be a complete locally determined measure space, and I any non-empty set; let be
counting measure on I. Show that the c.l.d. product measure on X I is equal to (or at any rate identifiable
with) the direct sum measure of the family h(Xi , i , i )iiI , if we set (Xi , i , i ) = (X, , ) for every i.

> (i) Let h(Xi , i , i )iiI be a family of measure spaces, with direct sum (X, , ) (214K). Let (Y, T, )
be any measure space, andSgive X Y , Xi Y their c.l.d. product measures. Show that the natural bijection
between X Y and Z = iI ((Xi Y ) {i}) is an isomorphism between the measure of X Y and the
direct sum measure on Z.

> (j) Let (X, , ) be any measure space, and Y a singleton set {y}; let be the measure on Y such that
Y = 1. Show that the natural bijection between X {y} and X identifies the primitive product measure
on X {y} with as defined in 213Xa, and the c.l.d. product measure with the c.l.d. version of . Explain
how to put this together with 251Xf and 251Ic to prove 251S.

> (k) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Show
b
that is the c.l.d. version of its restriction to T.

(l) Let (X, , ) and (Y, T, ) be measure spaces, with primitive and c.l.d. product measures 0 , . Let
1 be any measure with domain T b such that 1 (E F ) = E F for every E , F T. Show that
W 1 W 0 W for every W T. b

(m) Let (X, , ) and (Y, T, ) be two measure spaces, and 0 the primitive product measure on X Y .
Show that the corresponding outer measure 0 is just the outer measure of 251A.

(n) Let (X, , ) and (Y, T, ) be measure spaces, and A X, B Y subsets; write A , B for the
subspace measures. Let 0 be the primitive product measure on X Y , and # 0 the subspace measure it
induces on A B. Let 0 be the primitive product measure of A and B on A B. Show that 0 extends
#0 . Show that if either () A and B T or () A and B can both be covered by sequences of sets of
finite measure or () and are both strictly localizable, then 0 = #
0 .

(o) Let (X, , ) and (Y, T, ) be any measure spaces, and 0 the primitive product measure on X Y .
Show that 0 (A B) = A B for any A X, B Y .

(p) Let (X, , ) and (Y, T, ) be measure spaces, and the completion of . Show that , and ,
have the same primitive product measures.

(q) Let (X, , ) be an atomless measure space, and (Y, T, ) any measure space. Show that the c.l.d.
product measure on X Y is atomless.

(r) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . (i) Show
that if and are purely atomic, so is . (ii) Show that if and are point-supported, so is .
214 Product measures 251Xs

(s) Let (X, , ) be a semi-finite measure space. Show that is atomless iff the diagonal {(x, x) : x X}
is negligible for the c.l.d. product measure on X X.

251Y Further exercises (a) Let X, Y be sets with -algebras of subsets , T. Suppose that h :
b
X Y R is T-measurable and : X Y is (, T)-measurable (121Yb). Show that x 7 h(x, (x)) :
X R is -measurable.

(b) Let (X, , ) be a complete locally determined measure space with a subspace A whose measure is
not locally determined (see 216Xb). Set Y = {0}, Y = 1 and consider the c.l.d. product measures on
X Y and A Y ; write , for their domains. Show that properly includes {W (A Y ) : W }.

(c) Let (X, , ) be any measure space, (Y, T, ) an atomless measure space, and f : X Y a (, T)-
measurable function. Show that {(x, f (x)) : x X} is negligible for the c.l.d. product measure on X Y .

251 Notes and comments There are real difficulties in deciding which construction to declare as the
product of two arbitrary measures. My phrase primitive product measure, and notation 0 , betray a bias;
my own preference is for the c.l.d. product , for two principal reasons. The first is that 0 is likely to be
bad, in particular, not semi-finite, even if and are good (251Xc, 252Yf), while inherits some of
the most important properties of and (see 251N); the second is that in the case of topological measure
spaces X and Y , there is often a canonical topological measure on X Y , which is likely to be more closely
related to than to 0 . But for elucidation of this point I must ask you to wait until 417 in Volume 4.
It would be possible to remove the primitive product measure entirely from the exposition, or at least
to relegate it to the exercises. This is indeed what I expect to do in the rest of this treatise, since (in my
view) all significant features of product measures on finitely many factors can be expressed in terms of the
c.l.d. product measure. For the first introduction to product measures, however, a direct approach to the
c.l.d. product measure (through the description of in 251O, for instance) is an uncomfortably large bite,
and I have some sort of duty to present the most natural rival to the c.l.d. product measure prominently
enough for you to judge for yourself whether I am right to dismiss it. There certainly are results associated
with the primitive product measure (251Xm, 251Xo, 252Yc) which have an agreeable simplicity.
The clash is avoided altogether, of course, if we specialize immediately to -finite spaces, in which the
two constructions coincide (251K). But even this does not solve all problems. There is a popular alternative
measure often called the product measure: the restriction 0B of 0 to the -algebra T. b (See, for
b
instance, Halmos 50.) The advantage of this is that if a function f on X Y is T-measurable, then
x 7 f (x, y) is -measurable for every y Y . (This is because

{W : W X Y, {x : (x, y) W } y Y }
b
is a -algebra of subsets of X Y containing E F for every E , F T, and therefore including T.)
2
The primary objection, to my mind, is that Lebesgue measure on R is no longer the product of Lebesgue
measure on R with itself. Generally, it is right to seek measures which measure as many sets as possible, and
I prefer to face up to the technical problems (which I acknowledge are off-putting) by seeking appropriate
definitions on the approach to major theorems, rather than rely on ad hoc fixes when the time comes to
apply them.
I omit further examples of product measures for the moment, because the investigation of particular
examples will be much easier with the aid of results from the next section. Of course the leading example,
and the one which should come always to mind in response to the words product measure, is Lebesgue
measure on R 2 , the case r = s = 1 of 251M and 251Q. For an indication of what can happen when one of
the factors is not -finite, you could look ahead to 252K.
I hope that you will see that the definition of the outer measure in 251A corresponds to the standard
definition of Lebesgue outer measure, with measurable rectangles E F taking the place of intervals, and
the functional E F 7 E F taking the place of length or volume of an interval; moreover, thinking
of E and F as intervals, there is an obvious relation between Lebesgue measure on R 2 and the product
measure on R R. Of course an obvious relationship is not the same thing as a proper theorem with
exact hypotheses and conclusions, but Theorem 251M is clearly central. Long before that, however, there is
252A Fubinis theorem 215

another parallel between the construction of 251A and that of Lebesgue measure. In both cases, the proof
that we have an outer measure comes directly from the defining formula (in 113Yd I gave as an exercise
a general result covering 251B), and consequently a very general construction can lead us to a measure.
But the measure would be of far less interest and value if it did not measure, and measure correctly, the
basic sets, in this case the measurable rectangles. Thus 251E corresponds to the theorem that intervals are
Lebesgue measurable, with the right measure (114Db, 114F). This is the real key to the construction, and
is one of the fundamental ideas of measure theory.
Yet another parallel is in 251Xm; the outer measure defining the primitive product measure 0 is exactly
equal to the outer measure defined from 0 . I described the corresponding phenomenon for Lebesgue measure
in 132C.
Any construction which claims the title canonical must satisfy a variety of natural requirements; for
instance, one expects the canonical bijection between X Y and Y X to be an isomorphism between
the corresponding product measure spaces. Commutativity of the product in this sense is I think obvious
from the definitions in 251A-251C. It is obviously desirable not, I think, obviously true that the product
should be associative in that the canonical bijection between (X Y ) Z and X (Y Z) should also
be an isomorphism between the corresponding products of product measures. This is in fact valid for both
the primitive and c.l.d. product measures (251W, 251Xd-251Xf).
Working through the classification of measure spaces presented in 211, we find that the primitive product
measure 0 of arbitrary factor measures , is complete, while the c.l.d. product measure is always
complete and locally determined. 0 may not be semi-finite, even if and are strictly localizable (252Yf);
but will be strictly localizable if and are (251N). Of course this is associated with the fact that the
c.l.d. product measure is distributive over direct sums (251Xi). If either or is atomless, so is (251Xq).
Both and 0 are -finite if and are (251K). It is possible for both and to be localizable but not
(254U).
At least if you have worked through Chapter 21, you have now done enough pure measure theory for this
kind of investigation, however straightforward, to raise a good many questions. Apart from direct sums, we
also have the constructions of completion, subspace, outer measure and (in particular) c.l.d. version to
integrate into the new ideas; I offer some results in 251S and 251Xj. Concerning subspaces, some possibly
surprising difficulties arise. The problem is that the product measure on the product of two subspaces can
have a larger domain than one might expect. I give a simple example in 251Yb and a more elaborate one in
254Ye. For strictly localizable spaces, there is no problem (251P); but no other criterion drawn from the list
of properties considered in 251 seems adequate to remove the possibility of a disconcerting phenomenon.

252 Fubinis theorem


Perhaps the most important feature of the concept of product measure is the fact that we can use it to
discuss repeated integrals. In this section I give versions of Fubinis theorem and Tonellis theorem (252B,
252G) with a variety of corollaries, the most useful ones being versions for -finite spaces (252C, 252H). As
applications I describe the relationship between integration and measuring ordinate sets (252N) and calculate
the r-dimensional volume of a ball in R r (252Q, 252Xh). I mention counter-examples showing the difficulties
which can arise with non--finite measures and non-integrable functions (252K-252L, 252Xf-252Xg).

252A Repeated integrals Let (X, , ) and (Y, T, ) be measure spaces, and f a real-valued function
defined on a set dom f X Y . We can seek to form the repeated integral
RR R R
f (x, y)(dy)(dx) = f (x, y)(dy) (dx),
which should be interpreted as follows: set
R
D = {x : x X, f (x, y)(dy) is defined in [, ]},
R
g(x) = f (x, y)(dy) for y D,
216 Product measures 252A
RR R
and then write f (x, y)(dy)(dx) = g(x)(dx) if this is defined. Of course the subset of Y on which
y 7 f (x, y) is defined may vary with x, but it must always be conegligible, as must D.
Similarly, exchanging the roles of X and Y , we can seek a repeated integral
RR R R
f (x, y)(dx)(dy) = f (x, y)(dx) (dy).
The point is that, under appropriate conditions on and , we can relate these repeated integrals to each
other by connecting them both with the integral of f itself with respect to the product measure on X Y .
As will become apparent shortly, it is essential here to allow oneself to discuss the integral of a function
which is not everywhere defined. It is of less importance whether one allows integrands and integrals to
take
R infinite
R values,
R but for definiteness let me say that I shall be following the rules of 135F; that is,
f = f + f provided that R f is Rdefined almost everywhere, takes values in [, ] and is virtually
measurable, and at most one of f + , f is infinite.

252B Theorem Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product (X Y, , ) (251F).
Suppose that is -finite and that is either
R strictly localizable or completeRRand locally determined. Let f
be a [, ]-valued
R function such that f d is defined in [, ]. Then f (x, y)(dy)(dx) is defined
and is equal to f d.
proof The proof of this result involves substantial technical difficulties. If you have not seen these ideas
before, you should almost certainly not go straight to the full generality of the version announced above.
I will therefore start by writing out a proof in the case in which both and are totally finite; this is
already lengthy enough. I will present it in such a way that only the central section (part (b) below) needs
to be amended in the general case, and then, after completing the proof of the special case, I will give the
alternative version of (b) which is required for the full result.
R RR
(a) Write L for the family of [0, ]-valued functions f such that f d and fR(x, y)(dy)(dx) are
defined and equal. My aim is to show first that f L whenever f is non-negative and f d is defined, and
then to look at differences of functions in L. To prove that enough functions belong to L, my strategy will
be to start with elementary functions and work outwards through progressively larger classes. It is most
efficient to begin by describing ways of building new members of L from old, as follows.
(i) f1 + f2 L for all f1 , f2 L, and cf L for all f L, c [0, [; this is because
R R R
(f1 + f2 )(x, y)(dy) = f1 (x, y)(dy) + f2 (x, y)(dy),
R R
(cf )(x, y)(dy) = c f (x, y)(dy)
whenever the right-hand sides are defined, which we are supposing to be the case for almost every x, so that
ZZ ZZ ZZ
(f1 + f2 )(x, y)(dy)(dx) = f1 (x, y)(dy)(dx) + f2 (x, y)(dy)(dx)
Z Z Z
= f1 d + f2 d = (f1 + f2 )d,
ZZ Z Z Z
(cf )(x, y)(dy)(dx) = c f (x, y)(dy)(dx) = c f d = (cf )d.

(ii) If hfn inN is a sequence in L such that fn (x, y) fn+1 (x, y) whenever n NR and (x, y)
dom fn dom fn+1 , then supnN fn L. P P Set f = supnN fn ; for x X, n N set gn (x) = fn (x, y)(dy)
when the integral is defined
T in [0, ]. Since here I am allowing
R as a value
R of a function, it is natural to
regard f as defined on nN dom fn . By B.Levis theorem, f d = supnN fn d; write u for this common
value in [0, ]. Next, because fn fn+1 wherever both are defined, gn gn+1 wherever both are defined,
for each n; we are supposing that fn L, so gn is defined -almost everywhere for each n, and
R R
supnN gn d = supnN fn d = u.
R T
By B.Levis theorem again, g d = u, where g = supnN gn . Now take any x nN dom gn , and consider
the functions fxn on Y , setting fxn (y) = fn (x, y) whenever this is defined. Each fxn has an integral in
[0, ], and fxn (y) fx,n+1 (y) whenever both are defined, and
252B Fubinis theorem 217
R
supnN fxn d = g(x);
R
so, using B.Levis theorem for a third time, (supnN fxn )d is defined and equal to g(x), that is,
R
f (x, y)(dy) = g(x).
This is true for almost every x, so
RR R R
f (x, y)(dy)(dx) = g d = u = f d.
Thus f L, as claimed. Q
Q
(iii) The expression of the ideas in the next section of the proof will go more smoothly if I introduce
another term. Write W for {W : W X Y, W L}. Then
() if W , W 0 W and W W 0 = , W W 0 W
by (i), because (W W 0 ) = W + W 0 ,
S
() nN Wn W whenever hWn inN is a non-decreasing sequence in W and supnN Wn is
finite
because hWn inN W , and we can use (ii). R
It is also helpful to note that, for any W X Y and any x X, W (x, y)(dy) = W [{x}], at
least whenever W [{x}] = {y : (x, y) W } is measured by R. Moreover, because is complete, a set
W XR Y belongs to iff W is -virtually measurable iff W d is defined in [0, ], and in this case
W = W d.
(iv) Finally, we need to observe that, in appropriate circumstances, the difference of two members
of W will belong to W: ifR W , W 0 W and W WR0 and W 0 < , then W 0 \ W W. P P We
0 0
are supposing
R that g(x) =
R W (x, y)(dy) and g (x) = W (x, y)(dy) are defined for almost every x,
and that g d = W , g 0 d = W 0 . Because W 0 is finite, g 0 must be finite almost everywhere, and
D = {x : x dom g dom g 0 , g 0 (x) < } is conegligible. Now, for any x D, both g(x) and g 0 (x) are finite,
so
y 7 (W 0 \ W )(x, y) = W 0 (x, y) W (x, y)
is the difference of two integrable functions, and
Z Z
(W 0 \ W )(x, y)(dy) = W 0 (x, y) W (x, y)(dy)
Z Z
= W (x, y)(dy) W (x, y)(dy) = g 0 (x) g(x).
0

Accordingly
RR R
(W 0 \ W )(x, y)(dy)(dx) = g 0 (x) g(x)(dx) = W 0 W = (W 0 \ W ),
and W 0 \ W belongs to W. Q Q
(Of course the argument just above can be shortened by a few words if we allow ourselves to assume that
and are totally finite, since then g(x) and g 0 (x) will be finite whenever they are defined; but the key
idea, that the difference of integrable functions is integrable, is unchanged.)
(b) Now let us examine the class W, assuming that and are totally finite.
(i) E F W for all E , F T. P
P (E F ) = E F (251J), and
R
(E F )(x, y)(dy) = F E(x)
for each x, so
ZZ Z
(E F )(x, y)(dy)(dx) = (F E(x))(dx) = E F
Z
= (E F ) = (E F )d. Q Q
218 Product measures 252B

(ii) Let E be {E F : E , F T}. Then E is closed under finite intersections (because (E F )


(E 0 F 0 ) = (E E 0 ) (F F 0 )) and is included in W. In particular, X Y W. But this, together with
(a-iv) and (a-iii-) above, means that W is a Dynkin class (definition: 136A), so includes the -algebra of
subsets of X Y generated by E, by the Monotone Class Theorem (136B); that is, W T b (definition:
251D).
(iii) Next, W W whenever W X Y is -negligible. P
P By 251Ib, there is a V T b such that
V (X Y ) \ W and V = ((X Y ) \ W ). Because (X Y ) = X Y is finite, V 0 = (X Y ) \ V is
b
-negligible, and we have W V 0 T. Consequently
RR
0 = V 0 = V 0 (x, y)(dy)(dx).
But this means that
R
D = {x : V 0 (x, y)(dy) is defined and equal to 0}
is conegligible. If x D, then we must have V 0 (x, y) = 0Rfor -almost every y, that is, V 0 [{x}] is negligible;
0
in which case
R W [{x}] V [{x}] is also negligible, and W (x, y)(dy) = 0. And this is true for every
x D, so W (x, y)(dy) is defined and equal to 0 for almost every x, and
RR
W (x, y)(dy)(dx) = 0 = W ,
as required. Q
Q
(iv) It follows that W. P b such that V W
P If W , then, by 251Ib again, there is a V T
and V = W , so that (W \ V ) = 0. Now V W by (ii) and W \ V W by (iii), so W W by (a-iii-).
Q
Q
(c) I return to the class L.
(i) If f L and g is a [0, ]-valued function defined and equal to f -a.e., then g L. P
P Set
W = (X Y ) \ {(x, y) : (x, y) dom f dom g, f (x, y) = g(x, y)},
RR
so that W = 0. (Remember that is complete.) By (b), W (x, y)(dy)(dx) = 0, that is, W [{x}] is
-negligible for -almost every x. Let D be {x : x X, W [{x}] is -negligible}. Then D is -conegligible.
If x D, then
W [{x}] = Y \ {y : (x, y) dom f dom g, f (x, y) = g(x, y)}
R R
is negligible, so that f (x, y)(dy) = g(x, y)(dy) if either is defined. Thus the functions
R R
x 7 f (x, y)(dy), x 7 g(x, y)(dy)
are equal almost everywhere, and
RR RR R R
g(x, y)(dy)(dx) = f (x, y)(dy)(dx) = f d = g d,
so that g L. Q
Q
R
(ii) Now let f be any non-negative function such that f d is defined in [0, ]. Then f L. P
P For
k, n N set
Wnk = {(x, y) : (x, y) dom f, f (x, y) 2n k}.
Because is complete and f is -virtually measurable and dom f is conegligible, every Wnk belongs to ,
P4n
so Wnk L, by (b). Set fn = k=1 2n Wnk , so that

fn (x, y) = 2n k if k 4n and 2n k f (x, y) < 2n (k + 1),


= 2n if f (x, y) 2n ,
= 0 if (x, y) (X Y ) \ dom f.
By (a-i), fn L for every n N, while hfn inN is non-decreasing, so f 0 = supnN fn L, by (a-ii). But
f =a.e. f 0 , so f L, by (i) just above. Q
Q
R
R + (iii)R Finally, let f be any [, ]-valued function such that f d is defined in [, ]. Then
f d, f d are both defined and at most one is infinite. By (ii), both f + and f belong to L. Set
252B Fubinis theorem 219
R R R R
g(x)
R = Rf + (x, y)(dy), h(x) = f (x, y)(dy) whenever these are defined; then g d = f + d and
h d = f d are both
R defined in [0, ]. R
Suppose first that f d is finite. Then h d is finite, so h must be finite -almost everywhere; set
D = {x : x dom g dom h, h(x) < }.
R R
For any x D, f + (x, y)(dy) and f (x, y)(dy) are defined in [0, ], and the latter is finite; so
R R R
f (x, y)(dy) = f + (x, y)(dy) f (x, y)(dy) = g(x) h(x)
is defined in ], ]. Because D is conegligible,
ZZ Z Z Z
f (x, y)(dy)(dx) = g(x) h(x)(dx) = g d h d
Z Z Z
+
= f d f d = f d,

as required. R
RR Thus we have the result
R whenR f d is finite. Similarly, or by applying the argument above to f ,
f (x, y)(dy)(dx) = f d if f + d is finite.
Thus the theorem is proved, at least when and are totally finite.
(b*) The only point in the argument above where we needed to know anything special about the measures
and was in part (b), when showing that W. I now return to this point under the hypotheses of the
theorem as stated, that is -finite and is either strictly localizable or complete and locally determined.
(i) It will be helpful to note that the completion of (212C) is identical with its c.l.d. version
(213E). PP If is strictly localizable, then = by 213Ha. If is complete and locally determined, then
= = (212D, 213Hf). Q Q
(ii) Write f = {G : G , G < }, Tf = {H : H T, H < }. For G f , H Tf let G ,
H and GH be the subspace measures on G, H and G H respectively; then GH is the c.l.d. product
measure of G and H (251P(ii-)). Now W (G H) W for every W . P P W (G H) belongs to
the domain of GH , so by (b) of this proof, applied to the totally finite measures G and H ,

(W (G H)) = GH (W (G H))
Z Z
= (W (G H))(x, y)H (dy)G (dx)
ZG ZH
= (W (G H))(x, y)(dy)G (dx)
G Y
(because (W (G H))(x, y) = 0 if y Y \ H, so we can use 131E)
Z Z
= (W (G H))(x, y)(dy)(dx)
X Y
R
by 131E again, because Y
(W (G H))(x, y)(dy) = 0 if x X \ G. So W (G H) W. Q
Q
(iii) In fact, W W for every W . P P Remember that we are supposing that is -finite. Let
hYn inN Rbe a non-decreasing sequence in Tf covering Y , and for each n N set Wn = W (X Yn ),
gn (x) = Wn (x, y)(dy) whenever this is defined. For any G f ,
R RR
g d =
G n
(W (G Yn ))(x, y)(dy)(dx)
is defined and equal to (W (G Yn )), by (ii). But this means, first, that G \ dom gn is negligible, that is,
that (G \ dom gn ) = 0. Since this is so whenever G is finite, (X \ dom gn ) = 0, and g is defined -a.e.;
but = , so g is defined -a.e., that is, -a.e. (212Eb). Next, if we set Ena = {x : x dom gn , gn (x) a}
for a R, then Ena G whenever G f , where is the domain of ; by the definition in 213D, Ena
is measured by = . As a isR arbitrary, gn is -virtually measurable (212Fa).
We can therefore speak of gn d. Now
220 Product measures 252B

ZZ Z Z
Wn (x, y)(dy)(dx) = gn d = sup gn
Gf G
(213B, because is semi-finite)
= sup (W (G Yn )) = (W (X Yn ))
Gf

by the definition in 251F. Thus W (X Yn ) W.


This is true for every n N. Because hYn inN Y , W W, by (a-iii-). Q
Q
(iv) We can therefore return to part (c) of the argument above and conclude as before.

252C The theorem above is of course asymmetric, in that different hypotheses are imposed on the two
factor measures and . If we want a symmetric theorem we have to suppose that they are both -finite,
as follows.
Corollary Let (X, , ) and (Y, T, ) RR be two -finite measureRRspaces, and the c.l.d. product measure
on X Y . If f is -integrable, then f (x, y)(dy)(dx) and f (x, y)(dx)(dy) are defined, finite and
equal.
proof Since and are surely strictly localizable (211Lc), we can apply 252B from either side to conclude
that
RR R RR
f (x, y)(dy)(dx) = f d = f (x, y)(dx)(dy).

252D So many applications of Fubinis theorem are to characteristic functions that I take a few lines
to spell out the form which 252B takes in this case, as in parts (b)-(b*) of the proof there.
Corollary Let (X, , ) and (Y, T, ) be measure spaces and the c.l.d. product measure on X Y . Suppose
that is -finite and that R is either strictly localizable or complete and locally determined.
(i) If W dom , then W [{x}](dx)
R is defined in [0, ] and
R equal to W .
(ii) If is complete, we can write W [{x}](dx) in place of W [{x}](dx).
R
proof The point is just that W (x, y)(dy) = W [{x}] whenever either is defined, where is the
completion of (212Fb). Now 252B tells us that
RR R
W = W (x, y)(dy)(dx) = W [{x}](dx).
We always
R have W [{x}] = W [{x}], by the definition of (212C); and if is complete, then = so
W = W [{x}](dx).

252E Corollary Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product (XY, , ). Suppose
that is -finite and that is either strictly localizable or complete and locally determined. Then if f is a
-measurable real-valued function defined on a subset of X Y , y 7 f (x, y) is -virtually measurable for
-almost every x X.
proof Let f be a -measurable extension of f to a real-valued function defined everywhere on X Y
(121I), and set fx (y) = f(x, y) for all x X, y Y ,
D = {x : x X, fx is -virtually measurable}.
If G and G < , then G \ D is negligible. P
P Let hYn inN be a non-decreasing sequence of sets of
finite measure covering Y respectively, and set

fn (x, y) = f(x, y) if x G, y Yn and |f(x, y)| n,


= 0 for other x X Y.

Then each fn is -integrable, being bounded and -measurable and zero off G Yn . Consequently, setting
fnx (y) = fn (x, y),
252G Fubinis theorem 221
R R R
( fnx d)(dx) exists = fn d.
But this surely means that fnx is -integrable, therefore -virtually measurable, for almost every x X.
Set
Dn = {x : x X, fnx is -virtually measurable};
T T
then every Dn is -conegligible, so nN Dn isTconegligible. But for any x G nN Dn , fx = limn fnx
is -virtually measurable. Thus G \ D X \ nN Dn is negligible. Q Q
This is true whenever G < . By 213J, because is either strictly localizable or complete and locally
determined, X \ D is negligible and D is conegligible. But, for any x D, y 7 f (x, y) is a restriction of fx
and must be -virtually measurable.

252F As a further corollary we can get some useful information about the c.l.d. product measure for
arbitrary measure spaces.
Corollary Let (X, , ) and (Y, T, ) be two measure spaces, the c.l.d. product measure on X Y , and
its domain. Let W be such that the vertical section W [{x}] is -negligible for -almost every x X.
Then W = 0.
proof Take E , F T of finite measure. Let EF be the subspace measure on E F . By 251P(ii-),
this is just the product of the subspace measures E and F . We know that W (E F ) is measured
by EF . At the same time, the vertical section (W (E F ))[{x}] = W [{x}] F is F -negligible for
E -almost every x X. Applying 252B to E and F and (W (E F )),
R
(W (E F )) = EF (W (E F )) = F (W [{x}] F )E (dx) = 0.
But looking at the definition in 251F, we see that this means that W = 0, as claimed.

252G Theorem 252B and its corollaries depend on the factor measures and belonging to restricted
classes. There is a partial result which applies to all c.l.d. product measures, as follows.
Tonellis theorem Let (X, , ) and (Y, T, ) be measure spaces, and (X Y, , ) their c.l.d. product.
Let
RR f be a -measurableRR[, ]-valued function defined on a member of , and suppose that either
|f (x, y)|(dx)(dy) or |f (x, y)|(dy)(dx) exists in R. Then f is -integrable.
proof Because the construction
RR of the product measure is symmetric in the two factors, it is enough to
consider the case in which |f (x, y)|(dy)(dx) is defined and finite, as the same ideas will surely deal
with the other case also.
(a) The first step is to check that f is defined and finite -a.e. P
P Set W = {(x, y) : (x, y) dom f, f (x, y)
is finite}. Then W . The hypothesis
RR
|f (x, y)|(dy)(dx) is defined and finite
includes the assertion
R
|f (x, y)|(dy) is defined and finite for -almost every x,
which implies that
for -almost every x, f (x, y) is defined and finite for -almost every y;
that is, that
for -almost every x, W [{x}] is -conegligible.
But by 252F this implies that (X Y ) \ W is -negligible, as required. Q
Q
R
(b)RRLet h be any non-negative -simple function such that h |f | -a.e. Then h cannot be greater
than |f (x, y)|(dy)(dx). P
P Set
W = {(x, y) : (x, y) dom f, h(x, y) |f (x, y)|}, h0 = h W ;
Pn
then h0 is a simple function and h0 =a.e. h. Express h0 as i=0 ai Wi where ai 0 and Wi <
for each i. Let > 0. For each i n there are Ei , Fi T such that Ei < , Fi < and
222 Product measures 252G

S S
(Wi (Ei Fi )) Wi . Set E = in Ei and F = in Fi . Consider the subspace measures E and
F and their product EF on E F ; then EF is the subspace measure on E F defined from (251P).
Accordingly, applying 252B to the product E F ,
R R R R
EF
h0 d = EF
h0 dEF = E F
h0 (x, y)F (dy)E (dx).
For any x, we know that h0 (x, y) |f (x, y)| whenever f (x, y) is defined. So we can be sure that
R R R
F
h0 (x, y)F (dy) = h0 (x, y)F (y)(dy) |f (x, y)|(dy)
R 0 R
at least whenever F h (x, y)F (dy) and |f (x, y)|(dy) are both defined, which is the case for almost every
x E. Consequently
Z Z Z
h0 d = h0 (x, y)F (dy)E (dx)
EF E F
Z Z ZZ
|f (x, y)|(dy)(dx) |f (x, y)|(dy)(dx).
E

On the other hand,


R R Pn Pn Pn
h0 d EF
h0 d = i=0 ai (Wi \ (E F )) i=0 ai (Wi \ (Ei Fi )) i=0 ai .
So
R R RR Pn
h d = h0 d |f (x, y)|(dy)(dx)+ i=0 ai .
R RR
As is arbitrary, h d |f (x, y)|(dy)(dx), as claimed. Q
Q
(c) This is true whenever h is a -simple function less than or equal to |f | -a.e. But |f | is -measurable
and is semi-finite (251Ic), so this is enough to ensure that |f | is -integrable (213B), which (because f is
supposed to be -measurable) in turn implies that f is -integrable.

252H Corollary Let (X, , ) and (Y, T, ) be -finite measure spaces, the c.l.d. product measure on
X Y , and its domain. Let f be a -measurable real-valued function defined on a member of . Then
if one of
R R R R R
XY
|f (x, y)|(d(x, y)), Y X
|f (x, y)|(dx)(dy), X Y
|f (x, y)|(dy)(dx)
exists in R, so do the other two, and in this case
R R R R R
XY
f (x, y)(d(x, y)) = Y X
f (x, y)(dx)(dy) = X Y
f (x, y)(dy)(dx).
R
proof (a) Suppose that |f |d is finite. Because both and are -finite, 252B tells us that
RR RR
|f (x, y)|(dx)(dy), |f (x, y)|(dy)(dx)
R
both exist and are equal to |f |d, while
RR RR
f (x, y)(dx)(dy), f (x, y)(dy)(dx)
R
both exist and are equal to f d.
RR
(b) Now suppose that |f (x, y)|(dy)(dx) exists in R. Then 252G tells us that |f | is -integrable,
so
RR we can use (a) to complete the argument. Exchanging the coordinates, the same argument applies if
|f (x, y)|(dx)(dy) exists in R.

252I Corollary Let (X, , ) and (Y, T, ) be measure spaces, the c.l.d. product measure on X Y ,
and its domain. Take W . If either of the integrals
R R
W 1 [{y}](dy), W [{x}](dx)
exists and is finite, then W < .
proof Apply 252G with f = W , remembering that
252Kc Fubinis theorem 223
R R
W 1 [{y}] = W (x, y)(dx), W [{x}] = W (x, y)(dy)
whenever the integrals are defined, as in the proof of 252D.

252J Remarks 252H is the basic form of Fubinis theorem; it is not a coincidence that most authors
avoid non--finite spaces in this context. The next two examples exhibit some of the difficulties which can
arise if we leave the familiar territory of more-or-less Borel measurable functions on -finite spaces. The
first is a classic.

252K Example Let (X, , ) be [0, 1] with Lebesgue measure, and let (Y, T, ) be [0, 1] with counting
measure.

(a) Consider the set


W = {(t, t) : t [0, 1]} X Y .
We observe that W is expressible as
T Sn
nN
k k+1
k=0 [ n+1 , n+1 ]
k
[ n+1 k+1
, n+1 b
] T.
If we look at the sections
W 1 [{t}] = W [{t}] = {t}
for t [0, 1], we have
RR R R
W (x, y)(dx)(dy) = W 1 [{y}](dy) = 0 (dy) = 0,
RR R R
W (x, y)(dy)(dx) = W [{x}](dx) = 1 (dx) = 1,
so the two repeated integrals differ. It is therefore not generally possible to reverse the order of repeated
integration, even for a non-negative measurable function in which both repeated integrals exist and are
finite.

b
(b) Because the set W of part (a) actually belongs to T, we know that it is measured by the c.l.d.
product measure , and 252F (applied with the coordinates reversed) tells us that W = 0.

(c) It is in fact easy to give a full description of .


(i) The point is that a set W [0, 1] [0, 1] belongs to the domain of iff every horizontal section
W 1 [{y}] is Lebesgue measurable. P P () If W , then, for every b [0, 1], ([0, 1] {b}) is finite, so
W ([0, 1] {b}) is a set of finite measure, and
R
(W ([0, 1] {b})) = (W ([0, 1] {b}))1 [{y}](dy) = W 1 [{b}]
by 252D, because is -finite, is both strictly localizable and complete and locally determined, and

(W ([0, 1] {b}))1 [{y}] = W 1 [{b}] if y = b,


= otherwise.

As b is arbitrary, every horizontal section of W is measurable. () If every horizontal section of W is


measurable, let F [0, 1] be any set of finite measure for ; then F is finite, so
S
W ([0, 1] F ) = yF W 1 [{y}] {y} T b .
But it follows that W itself belongs to , by 251H. Q
Q
(ii) Now some of the same calculations show that for every W ,
P
W = y[0,1] W 1 [{y}].
P
P For any finite F [0, 1],
224 Product measures 252Kc

Z
(W ([0, 1] F )) = (W ([0, 1] F ))1 [{y}](dy)
Z X
= W 1 [{y}](dy) = W 1 [{y}].
F yF

So
P P
W = supF [0,1] is finite yF W 1 [{y}] = y[0,1] W 1 [{y}]. Q
Q

252L Example For the second example, I turn to a problem that can arise if we neglect to check that
a function is measurable as a function of two variables.
Let (X, , ) = (Y, T, ) be 1 , the first uncountable ordinal (2A1Fc), with the countable-cocountable
measure (211R). Set
W = {(, ) : < 1 } X Y .
1
Then all the horizontal sections W [{}] = { : } are countable, so
R R
W 1 [{}](d) = 0 (d) = 0,
while all the vertical sections W [{}] = { : < 1 } are cocountable, so
R R
W [{}](d) = 1 (d) = 1.
Because the two repeated integrals are different, they cannot both be equal to the measure of W , and the
sole resolution is to say that W is not measurable for the product measure.

252M Remark A third kind of difficulty in the formula


RR RR
f (x, y)dxdy = f (x, y)dydx
b
can arise even on probability spaces with T-measurable real-valued functions defined everywhere if we
neglect to check that f is integrable with respect to the product measure. In 252H, we do need the hypothesis
that one of
R R R R R
XY
|f (x, y)|(d(x, y)), Y X
|f (x, y)|(dx)(dy), X Y
|f (x, y)|(dy)(dx)
is finite. For examples to show this, see 252Xf and 252Xg.

252N Integration through ordinate sets I: Proposition Let (X, , ) be a complete locally de-
termined measure space, and the c.l.d. product measure on X R, where R is given Lebesgue measure;
write for the domain of . For any [0, ]-valued function f defined on a conegligible subset of X, write
f , 0f for the ordinate sets
f = {(x, a) : x dom f, 0 a f (x)} X R,

0f = {(x, a) : x dom f, 0 a < f (x)} X R.


Then
R
f = 0f = f d
in the sense that if one of these is defined in [0, ], so are the other two, and they are equal.
proof (a) If f , then
R R
f (x)(dx) = {y : (x, y) f }(dx) = f
by 252D, writing for Lebesgue measure, because f is defined almost everywhere. Similarly, if 0f ,
R R
f (x)(dx) = {y : (x, y) 0f }(dx) = 0f .
*252P Fubinis theorem 225
R
(b) If f d is defined, then f is -virtually measurable, therefore measurable (because is complete);
again because is complete, dom f . So
S
0f = qQ,q>0 {x : x dom f, f (x) > q} [0, q],

T S 1
f = n1 : x dom f, f (x) q } [0, q]
qQ,q>0 {x n
0
R
belong to , so that f and f are defined. Now both are equal to f d, by (a).

252O Integration through ordinate sets II: Proposition Let (X, , ) be a measure space, and f
a -measurable [0, ]-valued function defined on a measurable conegligible subset of X. Then
R R R
f d = {x : x dom f, f (x) t}dt = 0 {x : x dom f, f (x) > t}dt
0
R
in [0, ], where the integrals . . . dt are taken with respect to Lebesgue measure.
P4n
proof For n, k N set Enk = {x : x dom f, f (x) > 2n k}, gn (x) = 2n k=1 Enk . Then Rhgn inN
is a non-decreasing
R sequence of measurable functions converging to f at every point of dom f , so f d =
limn gn d and {x : f (x) > t} = limn {x : gn (x) > t} for every t 0; consequently
R R
0
{x : f (x) > t}dt = limn 0
{x : gn (x) > t}dt.
On the other hand, {x : gn (x) > t} = Enk if 1 k 4n and 2n (k 1) < t 2n k, 0 if t 2n , so that
R P4n n R
0
{x : gn (x) > t}dt = k=1 2 Enk = gn d,
R R
for every n N. So 0 {x : f (x) > t}dt = f d.
Now {x : f (x) t} = {x : f (x) > t} for almost all t. P P Set C = {t : {x : f (x) > t} < },
h(t) = {x : f (x) > t} for t C. Then h : C [0, [ is monotonic, so is continuous almost everywhere
(222A). But at any point of C at which h is continuous,
{x : f (x) t} = limst {x : f (x) > s} = {x : f (x) > t}.
So we have the Rresult, since {x : f (x) t} = {x : fR(x) > t} = for any t [0, [ \ C. Q
Q

Accordingly 0 {x : f (x) t}dt is also equal to f d.

b
*252P If we work through the ideas of 252B for T-measurable functions, we get the following, which
is sometimes useful.
Proposition Let (X, , ) be a measure space, and b
R (Y, T, ) a -finite measure space. Then for any T-
measurable RRfunction f : X Y R[0, ], x 7 f (x, y)(dy) : X [0, ] is -measurable; and if is
semi-finite, f (x, y)(dy)(dx) = f d, where is the c.l.d. product measure on X Y .
proof (a) Let hYn inN be a non-decreasing sequence of subsets of Y of finite measure with union Y . Set

A = {W : W X Y, W [{x}] T for every x X,


x 7 (Yn W [{x}]) is -measurable for every n N}.
b by the
Then A is a Dynkin class of subsets of X Y including {E F : E , F T}, so includes T,
Monotone Class Theorem (136B).
b then
This means that if W T,
W [{x}] = supnN (Yn W [{x}])
is defined for every x X and is a -measurable function of x.
(b) Now, for n, k N, set
P4n
Wnk = {(x, y) : f (x, y) 2n k}, gn = k=1 2n Wnk .
Then if we set
R P4n
hn (x) = gn (x, y)(dy)= k=1 2n Wnk [{x}]
226 Product measures *252P

for n N and x X, hn : X [0, ] is -measurable, and


R R
limn hn (x) = limn gn (x, y))(dy) = f (x, y)(dy)
for every
R x, because hgn (x, y)inN is a non-decreasing sequence with limit f (x, y) for all x X, y Y . So
x 7 f (x, y)(dy) is also defined everywhere in X and is -measurable.
R R R
(c) If E X is measurable and has finite measure, then E f (x, y)(dy)(dx) = EY f d, applying
252B to the product of the subspace measure E and (and using 251P to check that the product of E
and is the subspace measure on E Y ). Now if W is defined and finite, there must be a non-decreasing
sequence
S hEn inN of subsets of X of finite measure such that W = supnN (W (En Y )), so that
W \ nN (En Y ) is negligible, and

Z Z
f d = lim f d
W n W (En Y )
(by B.Levis theorem applied to hf (W (En Y ))inN )
Z
lim f d
n E Y
Z nZ ZZ
= lim f (x, y)(dy)(dx) f (x, y)(dy)(dx).
n En

By 213B,
R R RR
f d = supW < W
f d f (x, y)(dy)(dx).
But also, if is semi-finite,
RR R R R
f (x, y)(dy)(dx) = supE< E
f (x, y)(dy)(dx) f d,
R RR
so f d = f (x, y)(dy)(dx), as claimed.

252Q The volume of a ball We now have all the essential machinery to perform a little calculation
which is, I suppose, Pr desirable simply as general knowledge: the volume of the unit ball {x : kxk 1} =
{(1 , . . . , r ) : i=1 i2 1} in R r . In fact, from a theoretical point of view, I think we could very nearly
just call it r and leave it at that; but since there is a general formula in terms of 2 = and factorials, it
seems shameful not to present it. The calculation has nothing to do with Lebesgue integration, and I could
dismiss it as mere advanced calculus; but since only a minority of mathematicians are now taught calculus
to this level with reasonable rigour before being introduced to the Lebesgue integral, I do not doubt that
many readers, like myself, missed some of the subtleties involved. I therefore take the space to spell the
details out in the style used elsewhere in this volume, recognising that the machinery employed is a good
deal more elaborate than is really necessary for this result.

(a) The first basic fact we need is that, for any n 1,


Z /2
(2k)!
In = cosn t dt = if n = 2k is even,
/2
(2k k!)2

(2k k!)2
=2 if n = 2k + 1 is odd.
(2k+1)!

P
P For n = 0, of course,
R /2 0!
I0 = /2
1 dt = = ,
(20 0!)2
while for n = 1 we have
(20 0!)2
I1 = sin 2 sin( 2 ) = 2 = 2 ,
1!
252Qc Fubinis theorem 227

using the Fundamental Theorem of Calculus (225L) and the fact that sin0 = cos is bounded. For the
inductive step to n + 1 2, we can use integration by parts (225F):
Z /2
In+1 = cos t cosn t dt
/2
Z /2

= sin cosn sin( ) cosn ( ) + sin t n cosn1 t sin t dt
2 2 2 2 /2
Z /2
=n (1 cos2 t) cosn1 t dt = n(In1 In+1 ),
/2

n
so that In+1 = In1 . Now the given formulae follow by an easy induction. Q
Q
n+1

(b) The next result is that, for any n N and any a 0,


Ra
a
(a2 s2 )n/2 ds = In+1 an+1 .
PP Of course this is an integration by substitution; but the singularity of the integrand at s = a complicates
the issue slightly. I offer the following argument. If a = 0 the result is trivial; take a > 0. For a b a,
Rb
set F (b) = a (a2 s2 )n/2 ds. Because the integrand is continuous, F 0 (b) exists and is equal to (a2 b2 )n/2
for a < b < a (222H). Set G(t) = F (a sin t); then G is continuous and
G0 (t) = aF 0 (a sin t) cos t = an+1 cosn+1 t
for 2 < t <
2. Consequently

Z a Z /2

(a2 s2 )n/2 ds = F (a) F (a) = G( ) G( ) = G0 (t)dt
a
2 2 /2
(by 225L, as before)
= an+1 In+1 ,
as required. Q
Q
(c) Now at last we are ready for the balls Br = {x : x R r , kxk 1}. Let r be Lebesgue measure on
r
R , and set r = I1 I2 . . . Ir for r 1. I claim that, writing
Br (a) = {x : x R r , kxk a},
we have r (Br (a)) = r ar for every a 0. P P Induce on r. For r = 1 we have 1 = 2, B1 (a) = [a, a], so
the result is trivial. For the inductive step to r + 1, we have
Z
r+1 Br+1 (a) = r {x : (x, t) Br+1 (a)}dt

(putting 251M and 252D together, and using the fact that Br+1 (a) is closed, therefore measurable)
Z a p
= r Br ( a2 t2 )dt
a

(because (x, t) Br+1 (a) iff |t| a and kxk a2 t2 )
Z a
= r (a2 t2 )r/2 dt
a
(by the inductive hypothesis)
= r ar+1 Ir+1
(by (b) above)
= r+1 ar+1
(by the definition of r+1 ). Thus the induction continues. Q
Q
228 Product measures 252Qd

(d) In particular, the r-dimensional Lebesgue measure of the r-dimensional ball Br = Br (1) is just
r = I1 . . . Ir . Now an easy induction on k shows that

1 k
r = if r = 2k is even,
k!
22k+1 k! k
= if r = 2k + 1 is odd.
(2k+1)!

(e) Note that in part (c) of the proof we saw that {x : x R r , kxk a} has measure r ar for every
a 0.

252R Complex-valued functions It is easy to apply the results of 252B-252I above to complex-valued
functions, by considering their real and imaginary parts. Specifically:
(a) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Suppose
that is -finite and that is either strictly
RRlocalizable or complete and locally determined.
R Let f be a
-integrable complex-valued function. Then f (x, y)(dy)(dx) is defined and equal to f d.

(b) Let (X, , ) and (Y, T, ) be measure spaces, the c.l.d. product measure on X Y , and its
domain.RR Let f be a -measurable RR complex-valued function defined on a member of , and suppose that
either |f (x, y)|(dx)(dy) or |f (x, y)|(dy)(dx) is defined and finite. Then f is -integrable.

(c) Let (X, , ) and (Y, T, ) be -finite measure spaces, the c.l.d. product measure on X Y , and
its domain. Let f be a -measurable complex-valued function defined on member of . Then if one of
R R R R R
XY
|f (x, y)|(d(x, y)), Y X
|f (x, y)|(dx)(dy), X Y
|f (x, y)|(dy)(dx)
exists in R, so do the other two, and in this case
R R R R R
XY
f (x, y)(d(x, y)) = Y X
f (x, y)(dx)(dy) = X Y
f (x, y)(dy)(dx).

252X Basic exercises (a) Let (X, , ) and (Y, T, ) be measure spaces,
R and the c.l.d. product
measure on X Y . Let f be a -integrable real-valued function such that EF f = 0 whenever E ,
R
F T, E < and F < . Show that f = 0 -a.e. (Hint: use 251Ie to show that W f = 0 whenever
W < .)

> (b) Let (X, , ) and (Y, T, ) be complete locally determined measure spaces, the c.l.d. product
measure on X Y , and its domain. Suppose that A X and B Y . Show that A B iff either
A = 0 or B = 0 or A and B T. (Hint: if B is not negligible and A B , take H such that
H < and B H is not negligible. By 251P, W = A (B H) is measured by H , where H is the
subspace measure on H. Now apply 252D to , H and W to see that A .)

> (c) Let (X1 , 1 , 1 ), (X2 , 2 , 2 ), (X2 , 3 , 3 ) be three -finite measure spaces, and f a real-valued
function defined almost RRR everywhere on X1 X2 X3 and measurable for RRR the product measure described
in
RRR251Xf. Show that if |f (x1 , x 2 , x3 )|dx 1 dx2 dx 3 is defined in R, then f (x1 , x2 , x3 )dx2 dx3 dx1 and
f (x1 , x2 , x3 )dx3 dx1 dx2 exist and are equal.

b such that
(d) Give an example of strictly localizable measure spaces (X, , ), (Y, T, ) and a W T
x 7 W [{x}] is not -measurable. (Hint: in 252Kb, try Y a proper subset of [0, 1].)

b
> (e) Let (X, , ) and (Y, T, ) be measure spaces, and f a T-measurable function defined on a subset
of X Y . Show that y 7 f (x, y) is T-measurable for every x X.
2
RR
RR> (f ) Set f (x, y) = sin(x y) if 0 y x y + 2, 0 for other x, y R . Show that f (x, y)dx dy = 0,
f (x, y)dy dx = 2, taking all integrals with respect to Lebesgue measure.
252Ye Fubinis theorem 229

x2 y 2 R1R1 R1R1
> (g) Set f (x, y) = for x, y ]0, 1]. Show that 0 0
f (x, y)dydx = , 0 0
f (x, y)dxdy = .
(x2 +y 2 )2 4 4

(h) Let r 1 be an integer, and write r for the Lebesgue measure of the unit ball in R r . Set gr (t) =
rr tr1 for tR 0. Set (x) = kxk for x R r . (i) Writing r for Lebesgue measure on Rr , show that
r 1 [E] = E rr tr1 1 (dt) for every Lebesgue measurable set E [0, [. (Hint: start with intervals E,
noting from 115Xe that r {x : kxk a} = r ar for a 0, and progress to open sets, negligible sets and
general measurable sets.) (ii) Using 235T, show that
Z Z
2 2 r
ekxk /2 r (dx) = rr tr1 et /2 1 (dt) = 2(r2)/2 rr ( )
0
2
r 1
= 2r/2 r (1 + ) = ( 2( ))r
2 2

where is the -function (225Xj). (iii) Show that


R 2
2( 21 )2 = 22 0
tet /2
dt = 2,
r/2 R 2
and hence that r = and
et /2
dt = 2.
(1+ r2 )

(i) Let f , g : R R be two non-decreasing functions, and f , g the associated Lebesgue-Stieltjes


measures (see 114Xa). Set
f (x+ ) = limtx f (t), f (x ) = limtx f (t)
for each x R, and define g(x+ ), g(x ) similarly. Show that whenever a b in R,
Z Z

f (x )g (dx) + g(x+ )f (dx) = g(b+ )f (b+ ) g(a )f (a )
[a,b] [a,b]
Z Z
1 + 1
= (f (x ) + f (x ))g (dx) + ((g(x ) + g(x+ ))f (dx).
[a,b]
2 [a,b]
2

(Hint: find two expressions for (f g ){(x, y) : a x < y b}.)

252Y Further exercises (a) Let (X, , ) be a measure space. Show that the following are equiveridi-
cal: (i) the completion of is locally determined; (ii) the completion of coincides with the c.l.d. version
of ; (iii) whenever (Y,
R T, ) is a -finite measure space and
RR the c.l.d. product measure on X Y and R f is
a function such that f d is defined in [, ], then f (x, y)(dy)(dx) is defined and equal to f d.

(b) Let (X, , ) be a measure space. Show that the following are equiveridical: (i) has locally de-
termined negligible sets (213I);
RR (ii) whenever (Y, T, ) is a -finite measure
R space and the c.l.d. product
measure on X Y , then f (x, y)(dy)(dx) is defined and equal to f d for any -integrable function f .

(c) Let (X, , ) and (Y, T, ) be measure spaces, and 0 theRRprimitive product measureR on X Y (251C).
Let f be any 0 -integrable real-valued function. Show that f (x, y)(dy)(dx) = f d0 . (Hint: show
that there are sequences hGn inNS , hHn inN of sets of finite measure such that f (x, y) is defined and equal
to 0 for every (x, y) (X Y ) \ nN Gn Hn .)

(d) Let (X, , ) and (Y, T, ) be measure spaces; let 0 be the primitive product measure on X Y , and

R the c.l.d.
R product measure. Show that if f is a 0 -integrable real-valued function, it is -integrable, and
f d = f d0 .

(e) Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product (X Y, , ). Let f be a non-negative
-measurable real-valued function defined on a -conegligible set, and suppose that
R R
f (x, y)(dx) (dy)
is finite. Show that f is -integrable.
230 Product measures 252Yf

(f ) Let (X, , ) be the unit interval [0, 1] with Lebesgue measure, and (Y, T, ) the interval with counting
measure, as in 252K; let 0 be the primitive product measure on [0, 1]2 . (i) Setting = {(t, t) : t [0, 1]},
show
P that 0 = . (ii) Show that 0 is not semi-finite. (iii) Show that if W dom 0 , then 0 W =
1
y[0,1] W [{y}] if there are a countable set A [0, 1] and a Lebesgue negligible set E [0, 1] such that
W ([0, 1] A) (E [0, 1]), otherwise.
(g) Let (X, , ) be a measure space, and 0 the primitive product measure on X R, where R is given
Lebesgue measure; write for its domain. For any [0, ]-valued function f defined on a conegligible subset
R
of X, write f , 0f for the corresponding ordinate sets, as in 252N. Show that if any of 0 f , 0 0f , f d
is defined and finite, so are the others, and all three are equal.
(h) Let (X, , ) be a complete locally determined measure space, and f a non-negative function defined
on a conegligible subset of X. Write f , 0f for the corresponding ordinate sets, as in 252N. Let be the
R
c.l.d. product measure on X R, where R is given Lebesgue measure. Show that f d = f = 0f .
R R
(i) Let (X, , ) be a measure space and f : X [0, [ a function. Show that f d = 0 {x : f (x)
t}dt.
(j) Let (X, , ) be a complete locally determined measure space and a < b in R, endowed with Lebesgue
measure; let be the domain of the c.l.d. product measure on X [a, b]. Let f : X ]a, b[ R be a
-measurable function such that t 7 f (x, t) : [a, b] R is continuous on [a, b] and differentiable on ]a, b[ for
f
every x X. (i) Show that the partial derivative with respect to the second variable is -measurable.
t
f R
(ii) Now suppose that is -integrable and that f (x, t0 )(dx) is defined and finite for some t0 ]a, b[.
R t
Show that F (t) = f (x, t)(dx) is defined in R for every t [a, b], that F is absolutely continuous, and that
R f R f
F 0 (t) = (x, t)(dx) for almost every t ]a, b[. (Hint: F (c) = F (a) + X[a,c] d for every c [a, b].)
t t

(a)(b) R1
(k) Show that = 0
ta1 (1 t)b1 dt for all a, b > 0. (Hint: show that
(a+b)
R R R Rx
0
ta1 t
ex (x t)b1 dxdt = 0
ex 0
ta1 (x t)b1 dtdx.)

(l) Let (X, , ) and (Y, T, ) be -finite measure spaces


R and
R the c.l.d. product R Rmeasure on X Y .
Suppose that f L0 () and that 1 < p < .R Show that ( | f (x, y)dx|p dy)1/p ( |f (x, y)|p dy)1/p dx.
p
(Hint: set q = p1 and consider the integral |f (x, y)g(y)|(d(x, y)) for g Lq (), using 244K.)
1 Ry
(m) Let be Lebesgue measure on [0, [; suppose that f Lp () where 1 < p < . Set F (y) = f
y 0
p R 1
for y > 0. Show that kF kp kf kp . (Hint: F (y) = 0 f (xy)dx; use 252Yl with X = [0, 1], Y = [0, [.)
p1
R
(n) Set f (t) = t ln(t + 1) for t > 1. (i) Show that (a + 1) = aa+1 ea 1 eaf (u) du for every a > 0.
(Hint: substitute u = at 1 in 225Xj(iii).) (ii) Show that there is a > 0 such that f (t) 13 t2 for 1 t .
(iii) Setting = 12 f (), show that (for a 1)
R af (t) R
a e dt aea 0 ef (t)/2 dt 0
2
as a . (iv) Set ga (t) = eaf (t/ a) if a < t a, 0 otherwise. Show that ga (t) et /3 for all a, t
2
and that lima ga (t) = et /2 for all t, so that
Z Z
ea (a+1) af (t)

lim a+ 1
= lim a e
dt = lim a eaf (t) dt
a a 2 a 1 a 1
Z Z
2
= lim ga (t)dt = et /2 dt = 2.
a
n!
(v) Show that limn n n = 2. (This is Stirlings formula.)
e n n
252 Notes Fubinis theorem 231

(o) Let (X, , ) be a measure space and f a -integrable


R complex-valued
R function.
R For ], ] set
H = {x : x dom f, Re(ei f (x)) > 0}. Show that Re(ei H f )d = 2 |f |, and hence that there
R 1 R
is some such that | H f | |f |. (Compare 246F.)

(p) Let (X, , ) be a complete measure space and write M0, for the set {f : f L0 (), {x : |f (x)| a}
is finite for some a [0, [}. (i) Show that for each f M0, there is a non-increasing f : ]0, [ R
such that L {t : f (t) } = {x : |f (x)| } for every > 0, writing L for Lebesgue measure. (ii)
R R E
Show that E |f |d 0 f dL for every E (allowing ). (Hint: (f E) f .) (iii) Show that
kf kpR = kf kp for every
R p [1, ], f M
0,
. (Hint: (|f |p ) = (f )p .) (iv) Show that if f , g M0,
then R |f g|d f g d R L . (Hint: look at simple functions first.) (v) Show that1 if is atomless
a
then 0 f dL = supE,Ea E |f | for every a 0. (Hint: 215D.) (vi) Show that A L () is uniformly
integrable iff {f : f A} is uniformly integrable in L1 (L ). (f is called the decreasing rearrangement
of f .)

(q) Let (X, , ) be a complete locally determined measure space, and write for Lebesgue measure on
[0, 1]. Show that the c.l.d. product measure on X [0, 1] is localizable iff is localizable. (Hints: (i) if
E , show that F is an essential supremum for E in iff F [0, 1] is an essential supremum for
{E [0, 1] : E E} in = dom . (ii) For W , n N, k < 2n set
Wnk = {x : x X, {t : (x, t) W, 2n k t 2n (k + 1)} 2n1 }.
Show that if W and Fnk is an essential supremum for {Wnk : W W} in for all n, k, then
S T S m
nN mn k<2m Fmk [2 k, 2m (k + 1)]
is an essential supremum for W in .)

(r) Let (X, , ) be the space of Example 216D, and give Lebesgue measure to [0, 1]. Show that the c.l.d.
product measure on X [0, 1] is complete, locally determined, atomless and not localizable.

(s) Let (X, , ) be a complete locally determined measure space and (Y, T, ) a semi-finite measure space
with Y > 0. Show that if the c.l.d. product measure on X Y is strictly localizable, then is strictly
localizable. (Hint: take F T, 0 < F < . Let hWi iiI be a decomposition of X Y . For i I, n N
set Ein = {x : {y : y F, (x, y) Wi } 2n }. Apply 213Ye to {Ein : i I, n N}.)

(t) Let (X, , ) be the space of Example 216E, and give Lebesgue measure to [0, 1]. Show that the c.l.d.
product measure on X [0, 1] is complete, locally determined, atomless and localizable, but not strictly
localizable.

(u) Show that if p is any non-zero (real) polynomial in r variables, then {x : x R r , p(x) = 0} is Lebesgue
negligible.

252 Notes and comments For a volume and a half now I have asked you to accept the idea of integrating
partially-defined functions, insisting that sooner or later they would appear at the core of the subject. The
moment has now come. If we wish to apply Fubinis and Tonellis theorems in the most fundamental of all
cases, with both factors equal to Lebesgue measure on the unit interval, it is surely natural to look at all
functions which are integrable on the square for two-dimensional Lebesgue measure. Now two-dimensional
Lebesgue measure is a complete measure, so, in particular, assigns zero measure to any set of the form
{(x, b) : x A} or {(a, y) : y A}, whether or not the set A is measurable for one-dimensional measure.
Accordingly, if f is a function of two variables which is integrable for two-dimensional Lebesgue measure,
there is no reason why any particular section x 7 f (x, b) or y 7 f (a, y) should beRRmeasurable, let alone
integrable. Consequently, even if f itself is defined everywhere, the outer integral of f (x, y)dxdy is likely
to be applied to a function which is not defined for every y. Let me remark that the problem does not
concern ; the awkward functions are those with sections so irregular that they cannot be assigned an
integral at all.
232 Product measures 252 Notes

I have seen many approaches to this particular nettle, generally less whole-hearted than the one I have
determined on for this treatise. Part of the difficulty is that Fubinis theorem really is at the centre of
measure theory. Over large parts of the subject, it is possible to assert that a result is non-trivial if and only
if it depends on Fubinis theorem. I am therefore unwilling to insert any local fix, saying that in this chapter,
we shall integrate functions which are not defined everywhere; before long, such a provision would have to
be interpolated into the preambles to half the best theorems, or an explanation offered of why it wasnt
necessary in their particular contexts. I suppose that one of the commonest responses is (like Halmos 50)
b
to restrict attention to T-measurable functions, which eliminates measurability problems for the moment
(252Xe, 252P); but unhappily (or rather, to my mind, happily) there are crucial applications in which the
b
functions are not actually T-measurable, but belong to some wider class, and this restriction sooner
or later leads to undignified contortions as we are forced to adapt limited results to unforeseen contexts.
Besides, it leaves unsaid the really rather important information that if f is a measurable function of two
variables then (under appropriate conditions) almost all its sections are measurable (252E).
In 252B and its corollaries there is a clumsy restriction: we assume that one of the measures is -finite
and the other is either strictly localizable or complete and locally determined. The obvious question is,
whether we need these hypotheses. From 252K we see that the hypothesis -finite on the second factor can
certainly not be abandoned, even when the first factor is a complete probability measure. The requirement
is either strictly localizable or complete and locally determined is in fact fractionally stronger than what
is needed, as well as disagreeably elaborate. The right hypothesis is that the completion of should be
locally determined (see (b*-i) of the proof of 252B). The point is that because the product of two measures
is the same as the product of their c.l.d. versions (251S), no theorem which leads from the product measure
to the factor measures can distinguish between a measure and its c.l.d. version; so that, in 252B, we must
expect to need and its c.l.d. version to give rise to the same integrals. The proof of 252B would be better
focused if the hypothesis was simplified to is -finite and is complete and locally determined. But this
would just transfer part of the argument into the proof of 252C.
We also have to work a little harder in 252B in order to cover functions and integrals taking the values
. Fubinis theorem is so central to measure theory that I believe it is worth taking a bit of extra trouble
to state the results in maximal generality. This is especially important because we frequently apply it in
multiply repeated integrals, as in 252Xc, in which we have even less control than usual over the intermediate
functions to be integrated.
I have expressed all the main results of this section in terms of the c.l.d. product measure. In the case
of -finite spaces, of course, which is where the theory works best, we could just as well use the primitive
product measure. Indeed, Fubinis theorem itself has a version in terms of the primitive product measure
which is rather more elegant than 252B as stated (252Yc), and covers the great majority of applications.
(Integrals with respect to the primitive and c.l.d. product measures are of course very closely related; see
252Yd.) But we do sometimes need to look at non--finite spaces, and in these cases the asymmetric form
in 252B is close to the best we can do. Using the primitive product measure does not help at all with the
most substantial obstacle, the phenomenon in 252K (see 252Yf).
The pre-calculus concept of an integral as the area under a curve is given expression in 252N: the integral
of a non-negative function is the measure of its ordinate set. This is unsatisfactory as a definition of the
integral, not just because of the requirement that the base space should be complete and locally determined
(which can be dealt with by using the primitive product measure, as in 252Yg), but because the construction
of the product measure involves integration (part (c) of the proof of 251E). The idea of 252N is to relate
the measure of an ordinate set to the integral of the measures of its vertical sections. Curiously, if instead
we integrate the measures of its horizontal sections, as in 252O, we get a more versatile result. (Indeed
this one does not involve the Rconcept of product measure, and could have appeared at any point after

123.) Note that the integral 0 . . . dt here is applied to a monotonic function, so may be interpreted as
an improper Riemann integral. If you think you know enough about the Riemann integral to make this a
tempting alternative to the construction in 122, the tricky bit now becomes the proof that the integral is
additive.
A different line of argument is to use integration over sections to define a product measure. The difficulty
with this approach is that unless we take great care we may find ourselves with an asymmetric construction.
My own view is that such an asymmetry is acceptable only when there is no alternative. But in Chapter 43
of Volume 4 I will describe a couple of examples.
253Ab Tensor products 233

Of the two examples I give here, 252K is supposed to show that when I call for -finite spaces they are
really necessary, while 252L is supposed to show that joint measurability is essential in Tonellis theorem
and its corollaries. The factor spaces in 252K, Lebesgue measure and counting measure, are chosen to show
that it is only the lack of -finiteness that can be the problem; they are otherwise as regular as one can
reasonably ask. In 252L I have used the countable-cocountable measure on 1 , which you may feel is fit
only for counter-examples; and the question does arise, whether the same phenomenon occurs with Lebesgue
measure. This leads into deep water, and I will return to it in Volume 5.
I ought perhaps to note explicitly that in Fubinis theorem, we really do need to have a function which is
integrable for the product measure. I include
RR 252Xf
RRand 252Xg to remind you that even in the best-regulated
circumstances, the repeated integrals f dxdy, f dydx may fail to be equal if f is not integrable as a
function of two variables.
There are many ways to calculate the volume r of an r-dimensional ball; the one I have used in 252Q
follows a line that would have been natural to me before I ever heard of measure theory. In 252Xh I
suggest another method. The idea of integration-by-substitution, used in part (b) of the argument, is there
supported by an ad hoc argument; I will present a different, more generally applicable, approach in Chapter
26. Elsewhere (252Xh, 252Yk, 252Yl) I find myself taking for granted substitutions of the form t 7 at,
t 7 a + t; for a systematic justification, see 263. Of course an enormous number of other formulae of
advanced calculus are also based on repeated integration of one kind or another, and I give a sample handful
of such results (252Xi, 252Yj-252Yn).

253 Tensor products


The theorems of the last section show that the integrable functions on a product of two measure spaces
can be effectively studied in terms of integration on each factor space separately. In this section I present a
very striking relationship between the L1 space of a product measure and the L1 spaces of its factors, which
actually determines the product L1 up to isomorphism as Banach lattice. I start with a brief note on bilinear
maps (253A) and a description of the canonical bilinear map from L1 () L1 () to L1 ( ) (253B-253E).
The main theorem of the section is 253F, showing that this canonical map is universal for continuous bilinear
maps from L1 () L1 () to Banach spaces; it also determines the ordering of L1 ( ) (253G). I end with
a description of a fundamental type of conditional expectation operator (253H) and notes on products of
indefinite-integral measures (253I) and upper integrals of special kinds of function (253J, 253K).

253A Bilinear maps Before looking at any of the measure theory in this section, I introduce a concept
from the theory of linear spaces.
(a) Let U , V and W be linear spaces over R (or, indeed, any other field). A map : U V W is
bilinear if it is linear in each variable separately, that is,
(u1 + u2 , v) = (u1 , v) + (u2 , v),

(u, v1 + v2 ) = (u, v1 ) + (u, v2 ),

(u, v) = (u, v) = (u, v)


for all u, u1 , u2 U , v, v1 , v2 V and scalars . Observe that gives rise to, and in turn can be defined by,
a linear operator T : U L(V ; W ), writing L(V ; W ) for the space of linear operators from V to W , where
(T u)(v) = (u, v)
for all u U , v V . Hence, or otherwise, we can see, for instance, that (0, v) = (u, 0) = 0 whenever
u U, v V .
If W 0 is another linear space over the same field, and S : W W 0 is a linear operator, then S : U V
0
W is bilinear.
234 Product measures 253Ab

(b) Now suppose that U , V and W are normed spaces, and : U V W is a bilinear map. Then we
say that is bounded if sup{k(u, v)k : kuk 1, kvk 1} is finite, and in this case we call this supremum
the norm kk of . Note that k(u, v)k kkkukkvk for all u U , v V (because
k(u, v)k = k(1 u, 1 v)k kk
whenever > kuk, > kvk).
If W 0 is another normed space and S : W W 0 is a bounded linear operator, then S : U V W 0 is
a bounded bilinear map, and kSk kSkkk.

253B Definition The most important bilinear maps of this section are based on the following idea. Let
f and g be real-valued functions. I will write f g for the function (x, y) 7 f (x)g(y) : dom f dom g R.

253C Proposition (a) Let X and Y be sets, and , T -algebras of subsets of X, Y respectively. If f is
a -measurable real-valued function defined on a subset of X, and g is a T-measurable real-valued function
b
defined on a subset of Y , then f g, as defined in 253B, is T-measurable.
(b) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . If f L0 ()
and g L0 (), then f g L0 ().
Remark Recall from 241A that L0 () is the space of -virtually measurable real-valued functions defined
on -conegligible subsets of X.
b
proof (a) The point is that f Y is T-measurable, because for any R there is an E such that
{x : f (x) } = E dom f ,
so that
{(x, y) : (f Y )(x, y) } = (E dom f ) Y = (E Y ) dom(f Y ),
b
and of course E Y T. b
Similarly, X g is T-measurable and f g = (f Y ) (X g) is
b
T-measurable.
(b) Let E , F T be conegligible subsets of X, Y respectively such that E dom f , F dom g,
b (251Ia). Also
f E is -measurable and gF is T-measurable. Write for the domain of . Then T
E F is -conegligible, because

((X Y ) \ (E F )) ((X \ E) Y ) + (X (Y \ F ))
= (X \ E) Y + X (Y \ F ) = 0
(also from 251Ia). So dom(f g) E F is conegligible. Also, by (a), (f g)(E F ) = (f E) (gF )
b
is T-measurable, therefore -measurable, and f g is virtually measurable. Thus f g L0 (), as
claimed.

253D Now we can apply the ideas of 253B-253C to integrable functions.


Proposition Let (X, , ) and (Y, T, ) be measure spaces, Rand write for
R the Rc.l.d. product measure on
X Y . If f L1 () and g L1 (), then f g L1 () and f g d = f d g d.
Remark I follow 242 in writing L1 () for the space of -integrable real-valued functions.
proof (a) Consider first the case f = E, g = F where E , F T have finite measure; then
f g = (E F ) is -integrable with integral
R R
(E F ) = E F = f d g d,
by 251Ia.
R R R
(b) It follows at once that f g is -simple, with f g d = f d g d, whenever f is a -simple
function and g is a -simple function.
(c) If f and g are non-negative integrable functions, there are non-decreasing sequences hfn inN , hgn inN
of non-negative simple functions converging almost everywhere to f , g respectively; now note that if E X,
253F Tensor products 235

F Y are conegligible, EF is conegligible in XY , as remarked in the proof of 253C, so the non-decreasing


sequence hfn gn inN of -simple functions converges almost everywhere to f g, and
R R R R R R
f g d = limn fn gn d = limn fn d gn d = f d g d
by B.Levis theorem.
(d) Finally, for general f and g, we can express them as the differences f + f , g + g of non-negative
integrable functions, and see that
R R R R
f g d = f + g + f + g f g + + f g d = f d g d.

253E The canonical map L1 L1 L1 I continue the argument from 253D. Because E F is coneg-
ligible in X Y whenever E and F are conegligible subsets of X and Y , f1 g1 = f g -a.e. whenever
f = f1 -a.e. and g = g1 -a.e. We may therefore define u v L1 (), for u L1 () and v L1 (), by
saying that u v = (f g) whenever u = f and v = g .
Now if f , f1 , f2 L(), g, g1 , g2 L() and a R,
(f1 + f2 ) g = (f1 g) + (f2 g),

f (g1 + g2 ) = (f g1 ) + (f g2 ),

(af ) g = a(f g) = f (ag).


It follows at once that the map (u, v) 7 u v is bilinear. R R R
Moreover, if f L1 () and g L1 (), |f | |g| = |f g|, so |f g|d = |f |d |g|d. Accordingly
ku vk1 = kuk1 kvk1
1 1
for all u L (), v L (). In particular, the bilinear map is bounded, with norm 1 (except in the
trivial case in which one of L1 (), L1 () is 0-dimensional).

253F We are now ready for the main theorem of this section.
Theorem Let (X, , ) and (Y, T, ) be measure spaces, and let be the c.l.d. product measure on X Y .
Let W be any Banach space and : L1 () L1 () W a bounded bilinear map. Then there is a unique
bounded linear operator T : L1 () W such that T (u v) = (u, v) for all u L1 (), v L1 (), and
kT k = kk.
proof (a) The centre of the argument is the following fact: if E0 , . . . , En are measurable
Pn sets of finite measure
in X, F0 , .P
. . , Fn are measurable sets of finite measure in Y , a0 , . . . , an R and i=0 ai (Ei Fi ) = 0 -
n
a.e., then i=0 ai (Ei , Fi ) = 0 in W . P P We can find a disjoint family hGj ijm of measurable sets of
finite measure in X such that Pm each E i is expressible as a union of some subfamily of the Gj ; so that Ei
is expressible in the form j=0 bij Gj (see 122Ca). Similarly, we can find a disjoint family hHk ikl of
Pl
measurable sets of finite measure in Y such that each Fi is expressible as k=0 cik Hk . Now
Pm Pl Pn Pn
j=0 k=0 i=0 ai bij cik (Gj Hk ) = i=0 ai (Ei Fi ) = 0 -a.e.

Because the Gj Hk are disjoint, and (Gj Hk ) = G Pnj Hk for all j, k, it follows that for every
jP m, k l we have either Gj = 0 or Hk = 0 or i=0 ai bij cik = 0. In any of these three cases,
n
i=0 ai bij cik (Gj , Hk ) = 0 in W . But this means that

Pm Pl Pn Pn
0 = j=0 k=0 i=0 ai bij cik (Gj , Hk ) =

i=0 ai (Ei , Fi ),

as claimed. Q
Q
(b) It follows that if E0 , . . . , En , E00 , . . . , Em
0
are measurable sets of finite measurePnin X, F0 , . . . , Fn ,
0 0
F
P0m , . . . , Fmmeasurable sets of finite measure in Y , a0 , . . . , an , a00 , . . . , a0m R and i=0 ai (Ei Fi ) =
are
0 0
i=0 i a Fi0 ) -a.e., then
(E i
Pn Pm 0 0 0
i=0 ai (Ei , Fi ) = i=0 ai (Ei , Fi )

in W . Let M be the linear subspace of L1 () generated by


236 Product measures 253F

{(E F ) : E , E < , F T, F < };


then we have a unique map T0 : M W such that
Pn Pn
T0 ( i=0 ai (Ei Fi ) ) = i=0 ai (Ei , Fi )
whenever E0 , . . . , En are measurable sets of finite measure in X, F0 , . . . , Fn are measurable sets of finite
measure in Y and a0 , . . . , an R. Of course T0 is linear.
(c) Some of the same calculations show that kT0 uk kkkuk1 for every u M . P
P If u M , then, by
Pm Pl
the arguments of (a), we can express u as j=0 k=0 ajk (Gj Hk ) , where hGj ijm and hHk ikl are

disjoint families of sets of finite measure. Now


m X
X l m X
X l
kT0 uk = k ajk (Gj , Hk )k |ajk |k(Gj , Hk )k
j=0 k=0 j=0 k=0
m X
X l m X
X l
|ajk |kkkGj k1 kHk k1 = kk |ajk |Gj Hk
j=0 k=0 j=0 k=0
m X
X l
= kk |ajk |(Gj Hk ) = kkkuk1 ,
j=0 k=0

as claimed. Q
Q
(d) The next point is to observe that M is dense in L1 () for k k1 . P P Repeating the ideas above once
again, we observe
S that if E0 , . . . , E n are sets of finite measure in X and F 0 , . . . , Fn are sets of finite measure
in Y , then ( in Ei Fi ) M ; this is because, expressing each Ei as a union of Gj , where the Gj are
disjoint, we have
S S 0
in Ei Fi = jm Gj Fj ,
S
where Fj0 = {Fi : Gj Ei } for each j; now hGj Fj0 ijm is disjoint, so
S Pm
( jm Gj Fj ) = j=0 (Gj Fj0 ) M.
So 251Ie tells us that whenever H < and > 0 there is a G such that (H4G) and G M ; now
kH G k1 = (G4H) ,
so H is approximated arbitrarily closely by members of M , and belongs to the closure M of M in L1 ().
Because M is a linear subspace of L1 (), so is M (2A4Cb); accordingly M contains the equivalence classes
of all -simple functions; but these are dense in L1 () (242M), so M = L1 (), as claimed. Q
Q
(e) Because W is a Banach space, it follows that there is a bounded linear operator T : L1 () W
extending T0 , with kT k = kT0 k kk (2A4I). Now T (u v) = (u, v) for all u L1 (), v L1 (). P
P If
u = E , v = F , where E and F are measurable sets of finite measure, then
T (u v) = T ((E F ) ) = T0 ((E F ) ) = (E , F ) = (u, v).
Because and are bilinear and T is linear,
T (f g ) = (f , g )
whenever f and g are simple functions. Now whenever u L1 (), v L1 () and > 0, there are simple
functions f , g such that ku f k1 , kv g k1 (242M again); so that

k(u, v) (f , g )k k(u f , v g )k + k(u, g v)k + k(f u, v)k


kk(2 + kuk1 + kvk1 ).
Similarly
ku v f g k1 ( + kuk1 + kvk1 ),
so
253G Tensor products 237

kT (u v) T (f g )k kT k( + kuk1 + kvk1 );
because T (f g ) = (f , g ),
kT (u v) (u, v)k (kT k + kk)( + kuk1 + kvk1 ).
As is arbitrary, T (u v) = (u, v), as required. Q
Q
(f ) The argument of (e) ensured that kT k kk. Because ku vk1 kuk1 kvk1 for all u L1 (),
v L1 (), k(u, v)k kT kkuk1 kvk1 for all u, v, and kk kT k; so kT k = kk.
(g) Thus T has the required properties. To see that it is unique, we have only to observe that any
bounded linear operator S : L1 () W such that S(u v) = (u, v) for all u L1 (), v L1 () must
agree with T on objects of the form (E F ) where E and F are of finite measure, and therefore on every
member of M ; because M is dense and both S and T are continuous, they agree everywhere in L1 ().

253G The order structure of L1 In 253F I have treated the L1 spaces exclusively as normed linear
spaces. In general, however, the order structure of an L1 space (see 242C) is as important as its norm. The
map : L1 () L1 () L1 () respects the order structures of the three spaces in the following strong
sense.
Proposition Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y .
Then
(a) u v 0 in L1 () whenever u 0 in L1 () and v 0 in L1 ().
(b) The positive cone {w : w 0} of L1 () is precisely the closed convex hull C of {u v : u 0, v 0}
in L1 ().
*(c) Let W be any Banach lattice, and T : L1 () W a bounded linear operator. Then the following
are equiveridical:
(i) T w 0 in W whenever w 0 in L1 ();
(ii) T (u v) 0 in W whenever u 0 in L1 () and v 0 in L1 ().
proof (a) If u, v 0 then they are expressible as f , g where f L1 (), g L1 () and f 0, g 0.
Now f g 0 so u v = (f g) 0.
(b)(i) Write L1 ()+ for {w : w L1 (), w 0}. Then L1 ()+ is a closed convex set in L1 () (242De);
by (a), it contains u v whenever u L1 ()+ , v L1 ()+ , so it must include C.
(ii)() OfPcourse 0 = 0 0 C. () If u M , as defined in the proof of 253F, and u > 0, then u is
expressible as jm,kl ajk (Gj Hk ) , where G0 , . . . , Gm and H0 , . . . , Hl are disjoint sequences of sets of
finite measure, as in (a) of the proof of 253F. Now ajk can be negative only if (Gj Hk ) = 0, so replacing
every aP jk by max(0, ajk ) if necessary, we can suppose that ajk 0 for all j, k. Not all the ajk can be zero,
so a = jm,kl ajk > 0, and
P ajk P ajk
u= jm,kl a a(Gj Hk ) = jm,kl a (aGj ) Hk C.

() If w L1 ()+ and
R > R0, express w as h whereP

h 0 in L1 (). There is a simple function h1 0 such
n
that h1 a.e. h and h h1 + . Express h1 as i=0 ai Hi where Hi < , ai 0 for each i, and for
each i n choose sets GSi0 , . . . , Gimi , Fi0 , . . . , Fimi T, all of finite measure, such that Gi0 , . . . , Gimi
are disjoint and (Hi 4 jmi Gij Fij ) /(n + 1)(ai + 1), as in (d) of the proof of 253F. Set
Pn Pmi
w0 = i=0 ai j=0 (Gij Fij ) .
Then w0 C because w0 M and w0 0. Also

kw w0 k1 kw h1 k1 + kh1 w0 k1
Z X n Z mi
X
(h h1 )d + ai |Hi (Gij Fij )|d
i=0 j=0
n
X [
+ ai (H4 Gij Fij ) 2.
i=0 jmi
238 Product measures 253G

As is arbitrary and C is closed, w C. As w is arbitrary, L1 ()+ C and C = L1 ()+ .


(c) Part (a) tells us that (i)(ii). For the reverse implication, we need a fragment from the theory of
Banach lattices: W + = {w : w W, w 0} is a closed set in W . PP If w, w0 W , then
w = (w w0 ) + w0 |w w0 | + w0 |w w0 | + |w0 |,

w = (w0 w) w0 |w w0 | w0 |w w0 | + |w0 |,

|w| |w w0 | + |w0 |, |w| |w0 | |w w0 |,


because |w| = w (w) and the order of W is translation-invariant (241Ec). Similarly, |w0 | |w| |w w0 |
and ||w| |w0 || |w w0 |, so k|w| |w0 |k kw w0 k, by the definition of Banach lattice (242G). Setting
(w) = |w| w, we see that k(w) (w0 )k 2kw w0 k for all w, w0 W , so that is continuous.
Now, because the order is invariant under multiplication by positive scalars,
w 0 2w 0 w w w = |w| (w) = 0,
so W + = {w : (w) = 0} is closed. Q Q
Now suppose that (ii) is true, and set C1 = {w : w L1 (), T w 0}. Then C1 contains u v whenever
u, v 0; but also it is convex, because T is linear, and closed, because T is continuous and C1 = T 1 [W + ].
By (b), C1 includes {w : w L1 (), w 0}, as required by (i).

253H Conditional expectations The ideas of this section and the preceding one provide us with some
of the most important examples of conditional expectations.
Theorem Let (X, , ) and (Y, T, ) be complete probability spaces, with c.l.d. product (X Y, , ). Set
1 = {E Y : E }. Then 1 is a -subalgebra of . Given a -integrable real-valued function f , set
R
g(x, y) = f (x, z)(dz)
whenever this is defined. Then g is a conditional expectation of f on 1 .
proof We know that 1 , R by 251Ia, and 1 is a -algebra of sets because is. Fubinis theorem (252B,
252C) tells us that f1 (x) = f (x, z)(dz) is defined for almost every x, and therefore that g = f1 Y is
defined almost everywhere in X Y . f1 is -virtually measurable; because is complete, f1 is -measurable,
so g is 1 -measurable (since {(x, y) : g(x, y) } = {x : f1 (x) } Y for every R). Finally, if
W 1 , then W = E Y for some E , so

Z Z Z Z
g d = (f1 Y ) (E Y )d = f1 E d Y d
W
(by 253D)
ZZ Z
= E(x)f (x, y)(dy)(dx) = f (E Y )d

(by Fubinis theorem)


Z
= f d.
W

So g is a conditional expectation of f .

253I This is a convenient moment to set out a useful result on indefinite-integral measures.
Proposition Let (X, , ) and (Y, T, ) be measure spaces, and f L0 (), g L0 () non-negative func-
tions. Let 0 , 0 be the corresponding indefinite-integral measures (see 234). Let be the c.l.d. product of
and , and 0 the indefinite-integral measure defined from and f g L0 () (253Cb). Then 0 is the
c.l.d. product of 0 and 0 .
proof Write for the c.l.d. product of 0 and 0 .
253I Tensor products 239
R
(a) If we replace by its completion, we do not change the integral d (212Fb), so (by the definition
in 234A) we do not change 0 ; at the same time, we do not change , by 251S. The same applies to . So
it will be enough to prove the result on the assumption that and are complete; in which case f and g
are measurable and have measurable domains.
Set F = {x : x dom f, f (x) > 0} and G = {y : y dom g, g(y) > 0}, so that F G = {w : w
dom(f g), (f g)(w) > 0}. Then F is 0 -conegligible and G is 0 -conegligible, so F G is -conegligible
as well as 0 -conegligible. Because both and 0 are complete (251Ic, 234A), it will be enough to show that
the subspace measures F G , 0F G on F G are equal. But note that F G can be identified with the
product of 0F and G 0
, where 0F and G0
are the subspace measures on F , G respectively (251P(ii-)). At
0
the same time, F is the indefinite-integral measure defined from the subspace measure F on F and the
0
function f F , G is the indefinite-integral measure defined from the subspace measure G on G and gG,
0
and F G is defined from the subspace measure F G and (f F ) (gG). Finally, by 251P again, F G is
the product of F and G .
What all this means is that it will be enough to deal with the case in which F = X and G = Y , that is,
f and g are everywhere defined and strictly positive; which is what I will suppose from now on.
(b) In this case dom 0 = and dom 0 = T (234Da). Similarly, dom 0 = is just the domain of . Set
Fn = {x : x X, 2n f (x) 2n }, Gn = {y : y Y, 2n g(y) 2n }
for n N.
(c) Set
A = {W : W dom dom 0 , (W ) = 0 (W )}.
If 0 E and 0 H are defined and finite, then f E and g H are integrable, so
Z Z
0 (E H) = (f g) (E H)d = (f E) (g H)d
Z Z
= f E d g H d = (E H)

by 253D and 251Ia, that is, E H A. If we now look at AEH = {W : W X Y , W (E H) A},


then we see that
AEH contains E 0 H 0 for every E 0 , H 0 T,
S
if hWn inN is a non-decreasing sequence in AEH then nN Wn AEH ,
if W , W 0 AEH and W W 0 then W 0 \ W AEH .
Thus AEH is a Dynkin class of subsets of X Y , and by the Monotone Class Theorem (136B) includes the
b
-algebra generated by {E 0 H 0 : E 0 , H 0 T}, which is T.
(d) Now suppose that W . In this case W dom and W 0 W . P P Take n N, and E dom 0 ,
H dom such that E and H are both finite. Set E = E Fn , H = H Gn and W 0 = W (E 0 H 0 ).
0 0 0 0 0

Then W 0 , while E 0 2n 0 E and H 0 2n 0 H are finite. By 251Ib there is a V T b such


that V W 0 and V = W 0 . Similarly, there is a V 0 T b such that V 0 (E 0 H 0 ) \ W 0 and
V 0 = ((E 0 H 0 ) \ W 0 ). This means that ((E 0 H 0 ) \ (V V 0 )) = 0, so 0 ((E 0 H 0 ) \ (V V 0 )) = 0.
But (E 0 H 0 ) \ (V V 0 ) A, by (c), so ((E 0 H 0 ) \ (V V 0 )) = 0 and W 0 dom , while
W 0 = V = 0 V 0 W .
Since E and H are arbitrary, W (Fn Gn ) dom (251H) and (W (Fn Gn )) 0 W . Since
hEn inN , hGn inN are non-decreasing sequences with unions X, Y respectively,
W = supnN (W (En Gn )) 0 W . Q
Q

(e) In the same way, 0 W is defined and less than or equal to W for every W dom . P P The arguments
are very similar, but a refinement seems to be necessary at the last stage. Take n N, and E , H T
such that E and H are both finite. Set E 0 = E Fn , H 0 = H Gn and W 0 = W (E 0 H 0 ). Then
b such that
W 0 dom , while 0 E 0 2n E and 0 H 0 2n H are finite. This time, there are V , V 0 T
0 0 0 0 0 0 0 0 0 0
V W , V (E H ) \ W , V = W and V = ((E H ) \ W ). Accordingly
240 Product measures 253I

0 V + 0 V 0 = V + V 0 = (E 0 H 0 ) = 0 (E 0 H 0 ),
so that 0 W 0 is defined and equal to W 0 .
What this means is that W (Fn Gn )(E H) A whenever E and H are finite. So W (Fn Gn )
, by 251H; as n is arbitrary, W and 0 W is defined.
?? Suppose, if possible, that 0 W > W . Then there is some n N such that 0 (W (Fn Gn )) > W .
Because is semi-finite,
R 213B tells us that there is some -simple function h such that h (f g) (W
(Fn Gn )) and h d > W ; setting V = {(x, y) : h(x, y) > 0}, we see that V W (Fn Gn ), V is
defined and finite and 0 V > W . Now there must be sets E , H T such that E and F are both
finite and (V \ (E H)) < 4n (0 V W ). But in this case V dom (by (d)), so we can apply the
argument just above to V and conclude that V (E H) = V (Fn Gn ) (E H) belongs to A. And
now

0 V = 0 (V (E H)) + 0 (V \ (E H))
(V (E H)) + 4n (V \ (E H)) < V + 0 V W 0 V,
which is absurd. X
X
So 0 W is defined and not greater than W . Q
Q
(f ) Putting this together with (d), we see that 0 = , as claimed.
Remark If 0 and 0 are totally finite, so that they are truly continuous with respect to and in the
sense of 232Ab, then f and g are integrable, so f g is -integrable, and = 0 is truly continuous with
respect to .
The proof above can be simplified using a fragment of the general theory of complete locally determined
spaces, which will be given in 412 in Volume 4.

*253J Upper integrals The idea of 253D can be repeated in terms of upper integrals, as follows.
Proposition Let (X, , ) and (Y, T, ) be -finite measure spaces, with c.l.d. product measure . Then for
any functions f and g, defined on conegligible subsets of X and Y respectively, and taking values in [0, ],
R R R
f g d = f d g d.

Remark Here (f g)(x, y) = f (x)g(y) for all x dom f , y dom g, taking 0 = 0, as in 135.
R R R R
proof (a) I show first that f g f g. P P If f = 0, then f =a.e. 0, so f g =a.e. 0 and the result is
R R R
immediate. The same argument applies if g = 0. If both f and g are non-zero, and either is infinite,
the result is trivial. SoR let us
R supposeR thatRboth are finite. In this case there are integrable f0 , g0 such that
f a.e. f0 , g a.e. g0 , f = f0 and g = g0 (133J). So f g a.e. f0 g0 , and
R R R R R R
f g f0 g0 = f0 g0 = f g,
by 253D. Q
Q
R
(b) For the reverse inequality, we need consider only the case in which f g is finite, so that there is a
R R
-integrable function h such that f g a.e. h and f g = h. Set
R
f0 (x) = h(x, y)(dy)
whenever this is defined in R, which is almost everywhere, by Fubinis theorem (252B-252C). Then f0 (x)
R
f (x) g d for every x dom f0 dom f , which is a conegligible set in X; so
R R R R R
f g = h d = f0 d f g,
as required.

*253K A similar argument applies to upper integrals of sums, as follows.


Proposition Let (X, , ) and (Y, T, ) be probability spaces, with c.l.d. product measure . Then for any
real-valued functions f , g defined on conegligible subsets of X, Y respectively,
253Xc Tensor products 241
R R R
f (x) + g(y)(d(x, y)) = f (x)(dx) + g(y)(dy),
at least when the right-hand side is defined in [, ].
proof Set h(x, y) = f (x) + g(y) for x dom f , y dom g, so that dom h is -conegligible.
R R R R R
(a) As in 253J, I start by showing that h f + g. P P If either f or g is , this is trivial. Otherwise,
take integrable functions f0 , g0 such that f a.e. f0 and g a.e. g0 . Set h0 = (f0 Y ) + (X g0 ); then
h h0 -a.e., so
R R R R
h d h0 d = f0 d + g0 d.
R R R
As f0 , g0 are arbitrary, h f + g. Q
Q
(b) For the reverse inequality, suppose that h h0 for -almost every (x, y), where h0 is -integrable.
R R
Set f0 (x) = h0 (x, y)(dy) whenever this is defined in R. Then f0 (x) f (x) + g d whenever x
dom f dom f0 , so
R R R R
h0 d = f0 d f d + g d.
R R R
As h0 is arbitrary, h f + g, as required.

253L Complex spaces As usual, the ideas of 253F and 253H apply essentially unchanged to complex L1
spaces. Writing L1C (), etc., for the complex L1 spaces involved, we have the following results. Throughout,
let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y .

(a) If f L0C (), g L0C () then f g, defined by the formula (f g)(x, y) = f (x)g(y) for all x dom f ,
y dom g, belongs to L0C ().
R R R
(b) If f L1C (), g L1C () then f g L1C () and f g d = f d g d.

(c) We have a bilinear map (u, v) 7 u v : L1C () L1C () L1C () defined by writing f g = (f g)
for all f L1C (), g L1C ().

(d) If W is any complex Banach space and : L1C () L1C () W is any bounded bilinear map, then
there is a unique bounded linear operator T : L1C () W such that T (u v) = (u, v) for every u L1C (),
v L1C (), and kT k = kk.

(e) If and are complete probability measures, and 1 = {E Y : E R }, then for any f L1C ()
we have a conditional expectation g of f on 1 given by setting g(x, y) = f (x, z)(dz) whenever this is
defined.

253X Basic exercises > (a) Let U , V and W be linear spaces. Show that the set of bilinear maps
from U V to W has a natural linear structure agreeing with those of L(U ; L(V ; W )) and L(V ; L(U ; W )),
writing L(U ; W ) for the linear space of linear operators from U to W .

> (b) Let U , V and W be normed spaces. (i) Show that for a bilinear map : U V W the following
are equiveridical: () is bounded in the sense of 253Ab; () is continuous; () is continuous at some
point of U V . (ii) Show that the space of bounded bilinear maps from U V to W is a linear subspace
of the space of all bilinear maps from U V to W , and that the functional k k defined in 253Ab is a norm,
agreeing with the norms of B(U ; B(V ; W )) and B(V ; B(U ; W )), writing B(U ; W ) for the normed space of
bounded linear operators from U to W .

(c) Let (X1 , 1 , 1 ), . . . , (Xn , n , n ) be measure spaces, and the c.l.d. product measure on X1 . . .Xn ,
as described in 251W. Let W be a Banach space, and suppose that : L1 (1 ) . . . L1 (n ) W is
multilinear (that is, linear in each variable separately) and bounded (that is, kk = sup{(u1 , . . . , un ) :
kui k1 1 i n} < ). Show that there is a unique bounded linear operator T : L1 () W such that
T = , where : L1 (1 ) . . . L1 (n ) L1 () is a canonical multilinear map (to be defined).
242 Product measures 253Xd

(d) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Show
that if A L1 () and B L1 () are both uniformly integrable, then {u v : u A, v B} is uniformly
integrable in L1 ().

> (e) Let (X, , ) and (Y, T, ) be measure spaces and the c.l.d. product measure on X Y . Show
that
(i) we have a bilinear map (u, v) 7 u v : L0 () L0 () L0 () given by setting f g = (f g)
for all f L0 (), g L0 ();
(ii) if 1 p then u v Lp () and ku vkp = kukp kvkp for all u Lp (), v Lp ();
(iii) if u, u0 L2 () and v, v 0 L2 () then the inner product (u v|u0 v 0 ), taken in L2 (), is just
(u|u )(v|v 0 );
0

(iv) the map (u, v) 7 u v : L0 () L0 () L0 () is continuous if L0 (), L0 () and L0 () are all


given their topologies of convergence in measure.

(f ) In 253Xe, assume that and are semi-finite. Show P that if u0 , . . . , un are linearly independent
n
members of L0 () and v0 , . . . , vn L0 () are not all 0, then i=0 ui vi 6= 0 in L0 (). (Hint: start by
finding sets E , F T of finite measure such that u0 E , . . . , un E are linearly independent and

v0 F , . . . , vn F are not all 0.)

(g) In 253Xe, assume that and are semi-finite. If U , V are linear subspaces of L0 () and L0 ()
respectively, write U V for the linear subspace of L0 () generated by {u v : u U, v V }. Show that if
W is any linear space and : U V W is a bilinear map, there is a unique linear opeartor T : U V W
such that T (u v) = (u, v) P for all u U , v V . (Hint:
Pn start by showing that if u0 , . . . , un U and
n
v0 , . . . , vn V are such that i=0 ui vi = 0, then i=0 (ui , vi ) = 0 do this by expressing the ui as
linear combinations of some linearly independent family and applying 253Xf.)

>(h) Let (X, , ) and (Y, T, ) be completeR probability spaces, with c.l.d. product measure . Suppose
that p [1, ] and that f Lp (). Set g(x) = f (x, y)(dy) whenever this is defined. Show that g Lp ()
and that kgkp kf kp . (Hint: 253H, 244M.)

(i) Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product measure , and p [1, [. Show that
{w : w Lp (), w 0} is the closed convex hull in Lp () of {u v : u Lp (), v Lp (), u 0, v 0}
(see 253Xe(ii) above).

253Y Further exercises (a) Let (X, , ) and (Y, T, ) be measure spaces, and 0 the primitive
product measure on X Y . Show that if f L0 () and g L0 (), then f g L0 (0 ).

(b) Let (X, , ) and (Y, T, ) be measure spaces, and 0R the primitive Rproduct
R measure on X Y . Show
that if f L1 () and g L1 (), then f g L1 (0 ) and f g d0 = f d g d.

(c) Let (X, , ) and (Y, T, ) be measure spaces, and 0 , the primitive and c.l.d. product measures on
X Y . Show that the embedding L1 (0 ) L1 () induces a Banach lattice isomorphism between L1 (0 )
and L1 ().

(d) Let (X, , ), (Y, T, ) be strictly localizable measure spaces, with c.l.d. product measure . Show
that L () can be identified with L1 () . Show that under this identification {w : w L (), w 0} is
the weak*-closed convex hull of {u v : u L (), v L (), u 0, v 0}.

(e) Find a version of 253J valid when one of , is not -finite.

(f ) Let (X, , ) be any measure space and V any Banach space. Write L1V = L1V () for the set of functions
f such that () dom f is a conegligible subset of X () f takes values in V () there is a conegligible set
1
D
R dom f such that f [D] is separable and D f [G] for every open set G V () the integral
kf (x)k(dx) is finite. (These are the Bochner integrable functions from X to V .) For f , g L1V write
f g if f = g -a.e.; let L1V be the set of equivalence classes in L1V under . Show that
(i) f + g, cf L1V for all f , g L1V , c R;
253 Notes Tensor products 243

(ii) L1V has a natural linear space structure, defined by writing f + g = (f + g) , cf = (cf ) for f ,
g L1V and c R; R
(iii) L1V has a norm k k, defined by writing kf k = kf (x)k(dx) for f L1V ;
(iv) L1V is a Banach space under this norm;
(v) there is a natural map : L1 V L1V defined by writing (f v)(x) = f (x)v when f L1 = L1R (),
v V , x dom f ;
(vi) there is a canonical bilinear map : L1 V L1V defined by writing f v = (f v) for f L1 ,
v V;
(vii) whenever W is a Banach space and : L1 V W is a bounded bilinear map, there is a unique
bounded linear operator T : L1V W such that T (u v) = (u, v) for all u L1 , v V , and kT k = kk.

(g) Let (X, , ) and (Y, T, ) be measure spaces, and 0 the primitive product measure on X Y . If
f is a 0 -integrable function, write fx (y) = f (x, y) whenever this is defined. Show that we have a map
x 7 fx from a conegligible subset D0 of X to L1 (). Show that this map is a Bochner integrable function,
as defined in 253Yf.

(h) Let (X, , ) and (Y, T, ) be measure spaces, and suppose that is a function from X to a separable
subset of L1 () which is measurable in the sense that 1 [G] for every open G L1 (). Show that
there is a -measurable function f from X Y to R, where is the domain of the c.l.d. product measure
on X Y , such that (x) = fx for every x X, writing fx (y) = f (x, y) for x X, y Y .

(i) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Show that
253Yg provides a canonical identification between L1 () and L1L1 () ().

(j) Let (X, , ) and (Y, T, ) be complete locally determined measure


R spaces, with c.l.d. product measure
. (i) Suppose that K L2 (), f L2 (). Show that h(y) = K(x, y)f R (x)dx is defined for almost all
y Y and that h L2 (). (Hint: to see that h is defined a.e., consider EF K(x, y)f (x)d(x, y) for E,
R
F < ; to see that h L2 consider h g where g L2 ().) (ii) Show that the map f 7 h corresponds
to a bounded linear operator TK : L2 () L2 (). (iii) Show that the map K 7 TK corresponds to a
bounded linear operator, of norm at most 1, from L2 () to B(L2 (); L2 ()).

(k) Suppose that p, q [1, ] and that p1 + 1q = 1, interpreting


1
as 0 as usual. Let (X, , ), (Y, T, )
be complete locally determined measure spaces with c.l.d. product measure . Show that the ideas of 253Yj
can be used to define a bounded linear operator, of norm 1, from Lp () to B(Lq (); Lp ()).

(l) In 253Xc, suppose that W is a Banach lattice. Show that the following are equiveridical: (i) T u 0
whenever u L1 (); (ii) (u1 , . . . , un ) 0 whenever ui 0 in L1 (i ) for each i n.

253 Notes and comments Throughout the main arguments of this section, I have written the results
in terms of the c.l.d. product measure; of course the isomorphism noted in 253Yc means that they could
just as well have been expressed in terms of the primitive product measure. The more restricted notion of
integrability with respect to the primitive product measure is indeed the one appropriate for the ideas of
253Yg.
Theorem 253F is a universal mapping theorem; it asserts that every bounded bilinear operator on
L1 () L1 () factors through : L1 () L1 () L1 (), at least if the range space is a Banach space.
It is easy to see that this property defines the pair (L1 (), ) up to Banach space isomorphism, in the
following sense: if V is a Banach space, and : L1 () L1 () V is a bounded bilinear map such that
for every bounded bilinear map from L1 () L1 () to any Banach space W there is a unique bounded
linear operator T : V W such that T = and kT k = kk, then there is an isometric Banach space
isomorphism S : L1 () V such that S = . There is of course a general theory of bilinear maps between
Banach spaces; in the language of this theory, L1 () is, or is isomorphic to, the projective tensor product
of L1 () and L1 (). For an introduction to this subject, see Defant & Floret 93, I.3, or Semadeni 71,
20. I should perhaps emphasise, for the sake of those who have not encountered tensor products before,
that this theorem is special to L1 spaces. While some of the same ideas can be applied to other function
spaces (see 253Xe-253Xg), there is no other class to which 253F applies.
244 Product measures 253 Notes

There is also a theory of tensor products of Banach lattices, for which I do not think we are quite ready
(it needs general ideas about ordered linear spaces for which I mean to wait until Chapter 35 in the next
volume). However 253G shows that the ordering, and therefore the Banach lattice structure, of L1 () is
determined by the ordering of L1 () and L1 () and the map : L1 () L1 () L1 ().
The conditional expectation operators described in 253H are of very great importance, largely because in
this special context we have a realization of the conditional expectation operator as a function P0 from L1 ()
to L1 (1 ), not just as a function from L1 () to L1 (1 ), as in 242J. As described here, P0 (f + f 0 ) need
not be equal, in the strict sense, to P0 f + P0 f 0 ; it can have a larger domain. In applications, however, one
b
might be willing to restrict attention to the linear space U of bounded T-measurable functions defined
everywhere on X Y , so that P0 becomes an operator from U to itself (see 252P).

254 Infinite products


I come now to the second basic idea of this chapter: the description of a product measure on the product
of a (possibly large) family of probability spaces. The section begins with a construction on similar lines to
that of 251 (254A-254F) and its defining property in terms of inverse-measure-preserving functions (254G).
I discuss the usual measure on {0, 1}I (254J-254K), subspace measures (254L) and various properties of
subproducts (254M-254T), including a study of the associated conditional expectation operators (254R-
254T).
Q
254A Definitions (a) Let h(Xi , i , i )iiI be a family of probability spaces. Set X = iI Xi , the
family of functions x with domain I such that x(i) Xi for every i I. In this context, I will say that a
measurable cylinder is a subset of X expressible in the form
Q
C = iI Ci ,
where Ci i for every i I and
Q{i : Ci 6= Q
Xi } is finite. Note that for a non-empty C X this expression
P Suppose that C = iI Ci = iI Ci0 . For each i I set
is unique. P
Di = {x(i) : x C}.
Of course Di Ci . Because C 6= , we can fix on some z C. If i I and Ci , consider x X defined
by setting
x(i) = , x(j) = z(j) for j 6= i;
then x C so = x(i) Di . Thus Di = Ci for i I. Similarly, Di = Ci0 . Q
Q

(b) We can therefore define a functional 0 : C [0, 1], where C is the set of measurable cylinders, by
setting
Q
0 C = iI i Ci
whenever Ci i for every i I and {i : Ci 6= Xi } is finite, noting that only finitely many terms in the
product can differ from 1, so that it can safely be treated as a finite
Q product. If C = , one of the Ci must
be empty, so 0 C is surely 0, even though the expression of C as iI Ci is no longer unique.

(c) Now define : PX [0, 1] by setting


P S
A = inf{ n=0 0 Cn : Cn C for every n N, A nN Cn }.

254B Lemma The functional defined in 254Ac is always an outer measure on X.


proof Use exactly the same arguments as those in 251B above.

254CQDefinition Let h(Xi , i , i )iiI be any indexed family of probability spaces, and X the Cartesian
product iI Xi . The product measure on X is the measure defined by Caratheodorys method (113C)
from the outer measure defined in 254A.
254F Infinite products 245

Q
254D Remarks (a) In 254Ab, I asserted that if C C and no Ci is empty, then nor is C = iI Ci .
This is the Axiom of Choice: the product of any family hCi iiI of non-empty sets is non-empty, that is,
there is a choice function x with domain I picking out a distinguished member x(i) of each Ci . In this
volume I have not attempted to be scrupulous in indicating uses of the axiom of choice. In fact the use
here is not an absolutely vital one; I mean, the theory of infinite products, even uncountable products, of
probability spaces does not change character completely in the absence of the full axiom of choice (provided,
that is, that we allow ourselves to use
Q the countable axiom of choice). The point is that all we really need,
in the present context, is that X = iI Xi should be non-empty; and in many contexts we can prove this,
for the particular cases of interest, without using the axiom of choice, by actually exhibiting a member of
X. The simplest case in which this is difficult is when the Xi are uncontrolled Borel subsets of [0, 1]; and
even then, if they are presented with coherent descriptions, we may, with appropriate labour, be able to
construct a member of X. But clearly such a process is liable to slow us down a good deal, and for the
moment I think there is no great virtue in taking so much trouble.

(b) I have given this section the title infinite products, but it is useful to be able to apply the ideas to
finite I; I should mention in particular the cases #(I) 2.
(i) If I = , X consists of the unique function with domain I, the empty function. If we identify a
function with its graph, then X is actually {}; in any case, X is to be a singleton set, with X = 1.
(ii) If I is a singleton {i}, then we can identify X with Xi ; C becomes identified with i and 0 with
i , so that can be identified with i and the product measure becomes the measure on Xi defined from
i , that is, the completion of i (213Xa(iv)).
(iii) If I is a doubleton {i, j}, then we can identify X with Xi Xj ; in this case the definitions of
254A, 254C above match exactly with those of 251A and 251C, so that here can be identified with the
primitive product measure as defined in 251C. Because i and j are both totally finite, this agrees with
the c.l.d. product measure of 251F.
Q
254E Definition Let hXi iiI be any family of sets, and X = iI Xi . If i is a -subalgebra of subsets
N
of Xi for each i I, I write c iI i for the -algebra of subsets of X generated by
{{x : x X, x(i) E} : i I, E i }.
(Compare 251D.)

254FQTheorem Let h(Xi , i , i )iiI be a family of probability spaces, and let be the product measure
on X = iI Xi defined as in 254C; let be its domain.
(a) X = 1. Q Q
Q (b) If Ei i for every i I, and {i : Ei 6= Xi } is countable, then iI Ei , and ( iI Ei ) =
iI i Ei . In particular, C = 0 C for every measurable cylinder C, as defined in 254A, and if j I then
x 7 x(j) : X Xj is inverse-measure-preserving.
N
(c) c iI i .
(d) is complete.
(e) For
S every W and > 0 there is a finite family C0 , . . . , Cn of measurable cylinders such that
(W 4 kn Ck ) .
N
(f) For every W there are W1 , W2 c iI i such that W1 W W2 and (W2 \ W1 ) = 0.
Q
Remark Perhaps I should pause Q to interpret the product iI i Ei . Because all the i Ei belong to [0, 1],
this is simply inf JI,J is finite iJ i Ei , taking the empty product to be 1.
proof Throughout this proof, define C, 0 and as in 254A. I will write out an argument which applies to
finite I as well as infinite I, but you may reasonably prefer to assume that I is infinite on first reading.
Q
(a) Of course X = X, so I have to show that X = 1. Because X, C and 0 X = iI i Xi = 1
and 0 = 0,
X 0 X + 0 + . . . = 1.
246 Product measures 254F

I therefore have to show that X 1. ?? Suppose, if possible, otherwise.


P
(i) There is a sequence hCn inN in C, covering X, such that n=0 0 Cn < 1. For each n N, express
Cn as {x : x(i) Eni i I}, whereS every Eni i and Jn = {i : Eni 6= Xi } is finite. No Jn can be empty,
because 0 Cn < 1 = 0 X; set J = nN Jn . Then J is a countable non-empty subset of I. Set K = N if J
is infinite, {k : 0 k < #(J)} if J is finite; let k 7 ik : K Q
J be a bijection.
For each k K, set Lk = {ij : j < k} J, and set nk = iI\Lk i Eni for n N, k K. If J is finite,
then we can identify L#(J) with J, and set n,#(J) = 1 for every n. We have n0 = 0 Cn for each n, so
P
n=0 n0 < 1. For n N, k K, t Xik set

fnk (t) = n,k+1 if t En,ik ,


= 0 otherwise.
Then
R
fnk dik = n,k+1 ik En,ik = nk .

P (ii) Choose tk Xik inductively, for k K, as follows. The inductive hypothesis will be that
nMk nk < 1, where Mk = {n : n N, tj En,ij j < k}; of course M0 = N, so the induction
starts. Given that
P P R R P
1 > nMk nk = nMk fnk dik = ( nMk fnk )dik
P
(by B.Levis theorem), there must be a tk Xik such that nMk fnk (tk ) < 1. Now for such a choice of tk ,
P
n,k+1 = fnk (tk ) for every n Mk+1 , so that nMk+1 n,k+1 < 1, and the induction continues, unless J
is finite and k + 1 = #(J). In this last case we must just have M#(J) = , because n,#(J) = 1 for every n.
(iii) If J is infinite, we obtain a full sequence htk ikN ; if J is finite, we obtain just a finite sequence
htk ik<#(J) . In either case, there is an x X such that x(ik ) = tk for each k K. Now there must be some
m N such that x Cm . Because Jm = {i : Emi 6= Xi } is finite, there is a k N such that Jm Lk
(allowing
P k = #(J) if J is finite). Now m Mk , so in fact we cannot have k = #(J), and mk = 1, so
nMk nk 1, contrary to the inductive hypothesis. X
X
This contradiction shows that X = 1.
(b)(i) I take the particular case first. Let j I and E j , and let C C; set W = {x : x X, Qx(j)
E}; then C W and C \ W both belong to Q C, and 0 C = 0 (C W ) + 0 (C \ W ). P
P If C = iI Ci ,
i for each i, then C W = iI Ci0 , where Ci0 = Ci if i 6= j, and Cj0 = Cj E; similarly,
where Ci Q
C \ W = iI Ci00 , where Ci00 = Ci if i 6= j, and Cj00 = Cj \ E. So both belong to C, and
Q Q
0 (C W ) + 0 (C \ W ) = (j (Cj E) + j (Cj \ E)) i6=j Ci = iI Ci = 0 C. Q Q

(ii) Now suppose


S P that A X is any set, and > 0. Then there is a sequence hCn inN in C such that
A nN C n and n=0 0 Cn A + . In this case
S S
A W nN Cn W , A \ W nN Cn \ W,
so
P P
(A W ) n=0 0 (Cn W ), (A \ W ) n=0 0 (Cn \ W ),
and
P P
(A W ) + (A \ W ) n=0 0 (Cn W ) + 0 (Cn \ W ) = n=0 0 Cn A + .
As is arbitrary, (A W ) + (A \ W ) A; as A is arbitrary, W .
(iii) I show next that if JQ I is finite and Ci i for each i J, and C = {x : x X, x(i) Ci
i J}, then C and C = iJ i Ci . P P Induce on #(J). If #(J) = 0, that is, J = , then C = X and
this is part (a). For the inductive step to #(J) = n + 1, take any j J and set J 0 = J \ {j},
C 0 = {x : x X, x(i) Ci i J 0 },

C 00 = C 0 \ C = {x : x C 0 , x(j) Xj \ Cj }.
254G Infinite products 247

Q
Then C, C 0 , C 00 all belong to C, and 0 C 0 = iJ 0 i Ci = say, 0 C = j Cj , 0 C 00 = (1 j Cj ).
Moreover, by the inductive hypothesis, C 0 and = C 0 = C 0 . So C = C 0 {x : x(j) Cj } by
(ii), and C 00 = C 0 \ C .
We surely have C = C 0 C, C 00 0 C 00 ; but also
= C 0 = C + C 00 0 C + 0 C 00 = ,
so in fact
Q
C = 0 C = j Cj = iJ Ci ,
and the induction proceeds. Q
Q
Q
(iv) Now let us return to the general case of a set W of the form iI Ei where Ei i for each i,
and K = {i : Ei 6= Xi } is countable. If K is finite then W = {x : x(i) Ei i K} so W and
Q Q
W = iK i Ei = iI i Ei .
Otherwise, let hin inN be an enumeration of K.
Q For each n N set Wn = {x : x X, x(ik ) Eik k n};
then we know that Wn and that Wn = kn ik Eik . But hWn inN is a non-increasing sequence with
intersection W , so W and
Q Q
W = limn Wn = iK i Ei = iI i Ei .
N
(c) is an immediate consequence of (b) and the definition of c iI i .
(d) Because is constructed by Caratheodorys method it must be complete.
S P
(e) Let hCn inN be a sequence in C such that W nN Cn and n=0 0 Cn W + 21 . Set V =
S P S
by (b), V . Let n N be such that i=n+1 0 Ci 21 , and consider W 0 = kn Ck . Since
nN Cn ; S
V \ W 0 i>n Ci ,

(W 4W 0 ) (V \ W 0 ) + (V \ W ) = V W + (V \ W ) = V W + (V \ W )
X X 1 1
0 Ci W + 0 Ci + = .
2 2
i=0 i=n+1

N
(f )(i) If W and > 0 there is a V c iI i such that W V and V W + . P
P Let hCn inN
S P Nc
be a sequence in C such that W nN Cn and n=0 0 Cn W + . Then Cn iI i for each n, so
S N
V = nN Cn c iI i . Now W V , and
P
V = V n=0 0 Cn W + = W + . Q Q
N
(ii) Now, given W , let hVn inN be a sequence of sets in c iI i such that W Vn and Vn
T N
W + 2n for each n; then W2 = nN Vn belongs to c iI i and W W2 and W2 = W . Similarly,
N
there is a W 0 c i such that X \ W W 0 and W 0 = (X \ W ), so we may take W1 = X \ W 0 to
2 iI 2 2 2
complete the proof.

254G The following is a fundamental, indeed defining, property of product measures.


Lemma Let h(Xi , i , i )iiI be a family of probability spaces with product (X, , ). Let (Y, T, ) be a
complete probability space and : Y X a function. Suppose that 1 [C] C for every measurable
cylinder C X. Then is inverse-measure-preserving. In particular, is inverse-measure-preserving iff
1 [C] T and 1 [C] = C for every measurable cylinder C X.
Remark By I mean the usual outer measure defined from as in 132.
proof (a) First note that, writing for the outer measure of 254A, 1 [A]
S A for every
PA X. P
P Given
> 0, there is a sequence hCn inN of measurable cylinders such that A nN Cn and n=0 0 Cn A+,
248 Product measures 254G

where 0 is the functional of 254A. But we know that 0 C = C for every measurable cylinder C (254Fb),
so
S P P
1 [A] ( nN 1 [Cn ]) n=0 1 [Cn ] n=0 Cn A + .
As is arbitrary, 1 [A] A. Q
Q
(b) Now take any W . Then there are F , F 0 T such that
1 [W ] F , 1 [X \ W ] F 0 ,

F = 1 [W ] W = W , F 0 [X \ W ].
We have
F F 0 1 [W ] 1 [X \ W ] = Y ,
so
(F F 0 ) = F + F 0 (F F 0 ) W + (X \ W ) 1 = 0.
Now
F \ 1 [W ] F 1 [X \ W ] F F 0
is -negligible. Because is complete, F \ 1 [W ] T and 1 [W ] = F \ (F \ 1 [W ]) belongs to T.
Moreover,
1 = F + F 0 W + (X \ W ) = 1,
so we must have F = W ; but this means that 1 [W ] = W . As W is arbitrary, is inverse-measure-
preserving.

254H Corollary Let h(Xi , i , i )iiI and h(Yi , Ti , i )iiI be two families of probability spaces, with
products (X, , ) and (Y, 0 , 0 ). Suppose that for each i I we are given an inverse-measure-preserving
function i : Xi Yi . Set (x) = hi (x(i))iiI for x X. Then : X Y is inverse-measure-preserving.
Q Q
proof If C = iI Ci is a measurable cylinder in Y , then 1 [C] = iI 1 i [Ci ] is a measurable cylinder
in X, and
Q Q
1 [C] = iI i 1i [Ci ] =
0
iI i Ci = C.

Since is a complete probability measure, 254G tells us that is inverse-measure-preserving.

254I Corresponding to 251S we have the following.


Q
Proposition Let h(Xi , i , i )iiI be a family of probability spaces, the product measure on X = iI Xi ,
and its domain. Then is also the product of the completions i of the i (212C).
proof Write for the product of the i , and for its domain. (i) The identity map from Xi to itself
is inverse-measure-preserving if regarded as a map from (Xi , i ) to (Xi , i ), so the identity map on X is
inverse-measure-preserving if regarded as a map from (X, ) to (X, ), by 254H; that is, and = .
Q
(ii) If C is a measurable cylinder for hi iiI , that is, C = iI Ci where Ci i for every i and {i : CQi 6= Xi }
is finite, then for each i I we can find a Ci0 i such that Ci Ci0 and i Ci0 = i Ci ; setting C 0 = iI Ci0 ,
we get
Q Q
C C 0 = iI i Ci0 = iI i Ci = C.
By 254G, W must be defined and equal to W whenever W . Putting this together with (i), we see
that = .

254J The product measure on {0, 1}I (a) Perhaps the most important of all examples of infinite
product measures is the case in which each factor Xi is just {0, 1} and each i is the fair-coin probability
measure, setting
254K Infinite products 249

1
i {0} = i {1} = .
2

In this case, the product X = {0, 1}I has a family hEi iiI of measurable sets such that, writing for the
product measure on X,
T
( iJ Ei ) = 2#(J) if J I is finite.
(Just take Ei = {x : x(i) = 1} for each i.) I will call this the usual measure on {0, 1}I . Observe that
if I is finite then {x} = 2#(I) for each x X (using 254Fb). On the other hand, if I is infinite, then
{x} = 0 for every x X (because, again using 254Fb, {x} 2n for every n).

(b) There is a natural bijection between {0, 1}I and PI, matching x {0, 1}I with {i : i I, x(i) = 1}.
So we get a standard measure on PI, which I will call the usual measure on PI. Note that for any
finite b I and any c b we have
{a : a b = c} = {x : x(i) = 1 for i c, x(i) = 0 for i b \ c} = 2#(b) .

(c) Of course we can apply 254G to these measures; if (Y, T, ) is a complete probability space, a function
: Y {0, 1}I is inverse-measure-preserving iff
{y : y Y, (y)J = z} = 2#(J)
whenever J I is finite and z {0, 1}J ; this is because the measurable cylinders in {0, 1}I are precisely
the sets of the form {x : xJ = z} where J I is finite.

254K In the case of countably infinite I, we have a very important relationship between the usual
product measure of {0, 1}I and Lebesgue measure on [0, 1].
Proposition Let be the usual measure on X = {0, 1}N , and let be Lebesgue measure on [0, 1]; write
for the domain of and for P the domain of .
(i) For x X set (x) = i=0 2i1 x(i). Then
1 [E] and 1 [E] = E for every E ;
[F ] and [F ] = F for every F .
(ii) There is a bijection : X [0, 1] which is equal to at all but countably many points, and any such
bijection is an isomorphism between (X, , ) and ([0, 1], , ).
proof (a) The first point to observe is that is nearly a bijection. Setting
H = {x : x X, m N, x(i) = x(m) i m},

H 0 = {2n k : n N, k 2n },
then H and H 0 are countable and X \ H is a bijection between X \ H and [0, 1] \ H 0 . (For t [0, 1] \ H 0 ,
1 (t) is the binary expansion of t.) Because H and H 0 are countably infinite, there is a bijection between
them; combining this with X \ H, we have a bijection between X and [0, 1] equal to except at countably
many points. For the rest of this proof, let be any such bijection. Let M be the countable set {x : x
X, (x) 6= (x)}, and N the countable set [M ] [M ]; then [A]4[A] N for every A X.
(b) To see that 1 [E] exists and is equal to E for every E , I consider successively more complex
sets E.
) If E = {t} then 1 [E] = {1 (t)} exists and is zero.
(
) If E is of the form [2n k, 2n (k + 1)[, where n N and 0 k < 2n , then 1 [E] differs by at most
(
two points from a set of the form {x : x(i) = z(i) i < n}, so 1 [E] differs from this by a countable set,
and
1 [E] = 2n = E.

( ) If E is of the form [2n k, 2n l[, where n N and 0 k < l 2n , then


250 Product measures 254K

S
E= ki<l [2n i, 2n (i + 1)[,
so
1 [E] = 2n (l k) = E.

( ) If E is of the form [t, u[, where 0 t < u 1, then for each n N let kn be the integral part of 2n t
ln the integral part of 2n u; set En = [2n (kn + 1), 2n ln [; then hEn inN is a non-decreasing sequence
and S
and nN En is ]t, u[. So (using ())
[
1 [E] = 1 [ En ] = lim 1 [En ]
n
nN

= lim En = E.
n

() If E , then for any > 0 there is a sequence hIn inN of half-open subintervals of [0, 1[ such that
S P S
E \ {1} nN In and n=0 In E + ; now 1 [E] {1 (1)} nN 1 [In ], so
S P P
1 [E] ( nN 1 [In ]) n=0 1 [In ] = n=0 In E + .
As is arbitrary, 1 [E] E, and there is a V such that 1 [E] V and V E.
( ) Similarly, there is a V 0 such that V 0 1 [[0, 1] \ E] and V 0 ([0, 1] \ E). Now V V 0 = X,
while
V + V 0 E + (1 E) = 1 = (V V 0 ),
so (V V 0 ) = 0 and
1 [E] = (X \ V 0 ) (V V 0 1 [E])
belongs to , with
1 [E] V E;
at the same time,
1 1 [E] V 0 1 E
so 1 [E] = E.
(c) Now suppose that C X is a measurable cylinder of the special Pn form {x : x(0) = 0 , . . . , x(n) = n }
for some 0 , . . . , n {0, 1}. Then [C] = [t, t + 2n1 ] where t = i=0 2i1 i , so that [C] = C. Since
[C]4[C] N is countable, [C] = C.
If C X is any measurable cylinder, then it is of the form {x : xJ = z} for some finite J N; taking n so
large that J {0, . . . , n}, C is expressible as a disjoint union of 2n+1#(J) sets of the form just considered,
being just those in which i = z(i) for i J. Summing their measures, we again get [C] = C. Now
254G tells us that 1 : [0, 1] X is inverse-measure-preserving, that is, [W ] is Lebesgue measurable,
with measure W , for every W .
Putting this together with (b), must be an isomorphism between (X, , ) and ([0, 1], , ), as claimed
in (ii) of the proposition.
(d) As for (i), if E then 1 [E]41 [E] M is countable, so 1 [E] = 1 [E] = E. While if
W , [F ]4[W ] N is countable, so [W ] = [W ] = W .
(e) Finally, if : X [0, 1] is any other bijection which agrees with at all but countably many points,
set M 0 = {x : (x) 6= (x)}, N 0 = [M 0 ] [M 0 ]. Then
1 [E]41 [E] M 0 , 1 [E] = 1 [E] = E
for every E , and
[F ]4[F ] N 0 , [F ] = [F ] = F
for every F .
254L Infinite products 251

254L Subspaces Just as in 251P, we can consider the product of subspace measures. There is a
simplification in the form of the result because in the present context we are restricted to probability
measures.
Theorem Let h(Xi , i , i )iiI be a family of probability spaces, and (X, , ) their product.
(a) For each i I, let Ai Xi be a set of full outer measure, and write i for the subspace measure on
Q
Ai (214B). Let be the product measure on A = iI Ai . Then is the subspace measure on A induced
by . Q Q
(b) ( iI Ai ) = iI i Ai whenever Ai Xi for every i.

proof (a) Write A for the subspace measure on A defined from , and A for its domain; write for the
domain of .
Q
(i) Let : A X be the identity map. If C X is a measurable cylinder, say C = iI Ci where
Q
Ci i for each i, then 1 [C] = iI (Ci Ai ) is a measurable cylinder in A, and
Q Q
1 [C] = iI i (Ci Ai ) iI i Ci = C.
By 254G, is inverse-measure-preserving, that is, (A W ) = W for every W . But this means that
V is defined and equal to A V = V for every V A , since for any such V there is a W such that
V = A W and W = A V . In particular, A A = 1.
(ii) Now regard as a function
Q from the measure space (A, A , A ) to (A, , ). If D is a measurable
cylinder in A, we can express it as iI Di where every Di belongs to the domain of i and Di = Ai for all
but finitely many i. Now for each i we can find Ci i such that
Q Di = Ci Ai and Ci = i Di , and we
can suppose that Ci = Xi whenever Di = Ai . In this case C = iI Ci and
Q Q
C = iI i Ci = iI i Di = D.
Accordingly
A 1 [D] = A (A C) C = D.
By 254G again, is inverse-measure-preserving in this manifestation, that is, A V is defined and equal to
V for every V . Putting this together with (i), we have A = , as claimed.
(b) For each i I, choose a set Ei i such that Ai Ei and i Ei = i Ai ; do this in such a way that
Ei = Xi whenever i Ai = 1. Set Bi = Ai (Xi \ Ei ), so that i Bi = 1 for each i (if F i and F Bi
then F Ei Ai , so
i F = i (F Ei ) + i (F \ Ei ) = i Ei + i (Xi \ Ei ) = 1.)
Q
By (a), we can identify the subspace measure B on B = iI Bi with the product Q of the subspaceQmeasures
i onQBi . In particular, B = B B = 1. Now Ai = Bi Ei so (writing A = QiI Ai ), A = B iI Ei .
If iI i Ai = 0, then for every > 0 there is a finite J I such that iJ i Ai ; consequently
(using 254Fb)
Q
A {x : x(i) Ei for every i J} = iJ i Ei .
Q
As is arbitrary, A = 0. If iI i Ai > 0, then for every n N the set {i : Ai 1 2n } must be
finite, so
J = {i : Ai < 1} = {i : Ei 6= Xi }
Q
is countable. By 254Fb again, applied to hEi Bi iiI in the product iI Bi ,
Y Y
( Ai ) = B ( Ai ) = B {x : x B, x(i) Ei Bi for every i J}
iI iI
Y Y
= i (Ei Bi ) = i Ai ,
iJ iI

as required.
252 Product measures 254M

254M I now turn to the basic results which make it possible to use these product measures effectively.
First, I offer a vocabulary for dealing with subproducts. Let hXi iiI be a family of sets, with product X.
Q
(a) For J I, write XJ for iJ Xi . We have a canonical bijection x 7 (xJ, xI \ J) : X XI XI\J .
Associated with this we have the map x 7 J (x) = xJ : X XJ . Now I will say that a set W X is
determined by coordinates in J if there is a V XJ such that W = J1 [V ]; that is, W corresponds to
V XI\J XJ XI\J .
It is easy to see that

W is determined by coordinates in J
x0 W whenever x W, x0 X and x0 J = xJ
W = J1 [J [W ]].
It follows that if W is determined by coordinates in J, and J K I, W is also determined by coordinates
in K. The family WJ of subsets of X determined by coordinates in J is closed under complementation and
arbitrary unions and intersections. PP If W WJ , then
X \ W = X \ J1 [J [W ]] = J1 [XJ \ J [W ]] WJ .
If V WJ , then
S S S
V= V V J1 [J [V ]] = J1 [ V V J [V ]] WJ . Q
Q

(b) It follows that


S
W= {WJ : J I is countable},
the family of subsets of X determined by coordinates in some countable set, is a -algebra of subsets of X.
P (i) X and are determined by coordinates in (recall that X is a singleton, and that X = 1 [X ],
P
= 1 []). (ii) If W W, there is a countable J I such that W WJ ; now
X \ W = J1 [XJ \ J [W ]] WJ W.
(iii) If hWS
n inN is a sequence in W, then for each n N there is a countable Jn I such that W WJn .
Now J = nN Jn is a countable subset of I, and every Wn belongs to WJ , so
S
nN Wn WJ W. Q Q

(c) If i I and E Xi then {x : x X, x(i) E} is determined by the single coordinate i, so surely


N
belongs to W; accordingly W must include c iI PXi . A fortiori, if i is a -algebra of subsets of Xi for
N N
each i, W c i ; that is, every member of c i is determined by coordinates in some countable set.
iI iI

254N Theorem Let h(Xi , i , i )iiI be a family of


Q probability spaces and hKj ijJ a partition of I.
For each j J let j be the product measure on Zj = iKj Xi , and write for the product measure on
Q
X = iI Xi . Then the natural bijection
Q
x 7 (x) = hxKj ijJ : X jJ Zj
identifies with the product of the family hj ijJ .
In
Q particular, ifQK I is any set, then can be identified with the c.l.d. product of the product measures
on iK Xi and iI\K Xi .
Q
proof (Compare 251M.) Write Z = jJ Zj and for the product measure on Z; let , be the domains
of and .
Q
(a) Let C Z be a measurable cylinder. Then 1 [C] C. P P Express C as jJ Cj where Cj Zj
belongs to the domain j of j for each j. Set L = {j : Cj 6= Zj },Qso that L is finite. Let S > 0. For each
j L let hCjn inN be a sequence of measurable cylinders in Zj = iKj Xi such that Cj nN Cjn and
254O Infinite products 253

P Q
n=0 j Cjn Cj + . Express each Cjn as iKj Cjni where Cjni i for i Kj (and {i : Cjni 6= Xi }
is finite).
For f N L , set
Df = {x : x X, x(i) Cj,f (j),i whenever j L, i Kj }.
S
Because jL {i : Cj,f (j),i 6= Xi } is finite, Df is a measurable cylinder in X, and
Q Q Q
Df = jL iKj i Cj,f (j),i = jL j Cj,f (j) .
Also
S
{Df : f N L } 1 [C]
because if (x) C then (x)(j) Cj for each j L, so there must be an f N L such that (x)(j) Cj,f (j)
for every j L. But (because N L is countable) this means that
X X Y
1 [C] Df = j Cj,f (j)
f N L f N L jL
YX Y
= j Cjn (j Cj + ).
jL n=0 jL

As is arbitrary,
Q
1 [C] jL j Cj = C. Q
Q
By 254G, it follows that 1 [W ] is defined, and equal to W , whenever W .
Q
(b) Next, [D] = D for every measurable cylinder D X. P P This is easy. Express D as iI Di
Q Q
where Di i for every i I and {i : Di 6= i } is finite. Then [D] = jJ Dj , where Dj = iKj Di is a
measurable cylinder for each j J. Because {j : Dj 6= Zj } must also be finite (in fact, it cannot have more
Q
members than the finite set {i : Di 6= Xi }), jJ Dj is itself a measurable cylinder in Z, and
Q Q Q
[D] = jJ j Dj = jJ iKj Di = D. Q Q

Applying 254G to 1 : Z X, it follows that [W ] is defined, and equal to W , for every W .


But together with (a) this means that for any W X,
if W then [W ] and [W ] = W ,
if [W ] then W and W = [W ].
And of course this is just what is meant by saying that is an isomorphism between (X, , ) and (Z, , ).

254O Proposition Let h(Xi , i , Q


i )iiI be a family of probability spaces. For each J I let J be the
product probability measure on XJ = iJ Xi , and J its domain; write X = XI , = I and = I . For
x X and J I set J (x) = xJ XJ .
(a) For every J I, J is the image measure J1 (112E); in particular, J : X XJ is inverse-
measure-preserving for and J .
(b) If J I and W is determined by coordinates in J (254M), then J J [W ] is defined and equal
to W . Consequently there are W1 , W2 belonging to the -algebra of subsets of X generated by
{{x : x(i) E} : i J, E i }
such that W1 W W2 and (W2 \ W1 ) = 0.
(c) For every W , we can find a countable set J and W1 , W2 , both determined by coordinates in
J, such that W1 W W2 and (W2 \ W1 ) = 0.
(d) For every W , there is a countable set J I such that J [W ] J and J J [W ] = W ; so that
W 0 = J1 [J [W ]] belongs to , and (W 0 \ W ) = 0.
proof (a)(i) By 254N, we can identify with the product of J and I\J on XJ XI\J . Now J1 [E] X
corresponds to E XI\J XJ XI\J , so
254 Product measures 254O

( 1 [E]) = J E I\J XI\J = J E,


by 251E or 251Ia, whenever E J . This shows that J is inverse-measure-preserving.
(ii) To see that J is actually the image measure, suppose that E XJ is such that J1 [E] .
Identifying J1 [E] with E XI\J , as before, we are supposing that E XI\J is measurable for the product
measure on XJ XI\J . But this means that for I\J -almost every z XI\J , Ez = {y : (y, z) E XI\J }
belongs to J (252D(ii), because J is complete). Since Ez = E for every z, E itself belongs to J , as
claimed.
(b) If W is determined by coordinates in J, set H = J [W ]; then J1 [H] = W , so H J by (a)
N
just above. By 254Ff, there are H1 , H2 c iJ i such that H1 H H2 and J (H2 \ H1 ) = 0.
Let TJ be the -algebra of subsets of X generated by sets of the form {x : x(i) E} where i J and
E J . Consider T0J = {G : G XJ , J1 [G] TJ }. This is a -algebra of subsets of XJ , and it contains
{y : y XJ , y(i) E} whenever i J, E J (because
J1 [{y : y XJ , y(i) E}] = {x : x X, x(i) E}
N
whenever i J, E Xi ). So T0J must include c iJ i . In particular, H1 and H2 both belong to T0J , that
is, Wk = J1 [Hk ] belongs to TJ for both k. Of course W1 W W2 , because H1 H H2 , and
(W2 \ W1 ) = J (H2 \ H1 ) = 0,
as required.
N
(c) Now take any W . By 254Ff, there are W1 and W2 c iI i such that W1 W W2 and
(W2 \ W1 ) = 0. By 254Mc, there are countable sets J1 , J2 I such that, for each k, Wk is determined by
coordinates in Jk . Setting J = J1 J2 , J is a countable subset of I and both W1 and W2 are determined
by coordinates in J.
(d) Continuing the argument from (c), J [W1 ], J [W2 ] J , by (b), and J (J [W2 ] \ J [W1 ]) = 0.
Since J [W1 ] J [W ] J [W2 ], it follows that J [W ] J , with J J [W ] = J J [W2 ]; so that, setting
W 0 = J1 [J [W ]], W 0 , and
W 0 = J J [W ] = J J [W2 ] = J1 [J [W2 ]] = W2 = W .

254P Proposition Let h(Xi , i , i )iiIQ be a family of probability spaces, and for each J I let J be
the product probability measure on XJ = iJ Xi , and J its domain; write X = XI , = I and = I .
For x X and J I set J (x) = xJ XJ .
(a) If J I and g is a real-valued function defined on a subset of XJ , then g is J -measurable iff gJ is
-measurable.
(b) Whenever f is a -measurable real-valued function defined on a -conegligible subset of X, we can
find a countable set J I and a J -measurable function g defined on a J -conegligible subset of XJ such
that f extends gJ .
proof (a)(i) If g is J -measurable and a R, there is an H J such that {y : y dom g, g(y) a} =
H dom g. Now J1 [H] , by 254Oa, and {x : x dom gJ , gJ (x) a} = J1 [H] dom gJ . So gJ
is -measurable.
(ii) If gJ is -measurable and a R, then there is a W such that {x : gJ (x) a} = W dom gJ .
As in the proof of 254Oa, we may identify with the product of J and I\J , and 252D(ii) tells us that,
if we identify W with the corresponding subset of XJ XI\J , there is at least one z XI\J such that
Wz = {y : y XI , (y, z) W } belongs to J . But since (on this convention) gJ (y, z) = g(y) for every
y XJ , we see that {y : y dom g, g(y) a} = Wz dom g. As a is arbitrary, g is J -measurable.
(b) For rational numbers q, set Wq = {x : x dom f, f (x) q}. By 254Oc we can find for each q a
0 00
countable set Jq I and sets W S q , Wq , both determined
S by coordinates in Jq , such that Wq0 Wq Wq00
and (Wq \ Wq ) = 0. Set J = qQ Jq , V = X \ qQ (Wq00 \ Wq0 ); then J is a countable subset of I and V
00 0

is a conegligible subset of X; moreover, V is determined by coordinates in J because all the Wq0 , Wq00 are.
254R Infinite products 255

(Wq \ Wq0 ) V (Wq00 \ Wq0 ) = ; so Wq V is determined


For every q Q, Wq V = Wq0 V , because V S
by coordinates in J. Consequently V dom f = qQ V Wq is also determined by coordinates in J. Also
T
{x : x V dom f, f (x) a} = qa V Wq
is determined by coordinates in J. What this means is that if x, x0 V and J x = J x0 , then x dom f
iff x0 dom f and in this case f (x) = f (x0 ). Setting H = J [V dom f ], we have J1 [H] = V dom f a
conegligible subset of X, so (because J = J1 ) H is conegligible in XJ . Also, for y H, f (x) = f (x0 )
whenever J x = J x0 = y, so there is a function g : H R defined by saying that gJ (x) = f (x) whenever
x V dom f . Thus g is defined almost everywhere in XJ and f extends gJ . Finally, for any a R,
J1 [{y : g(y) a}] = {x : x V dom f, f (x) a} ;
by 254Oa, {y : g(y) a} J ; as a is arbitrary, g is measurable.

254Q Proposition Let h(Xi , i , i )iiI beQa family of probability spaces, and for each J I let J
be the product probability measure on XJ = iJ Xi ; write X = XI , = I . For x X, J I set
J (x) = xJ XJ .
(a) Let S be the linear subspace of RX spanned by {C : C X is a measurable R cylinder}. Then for
every -integrable real-valued function f and every > 0 there is a g S such that |f R g|d .R
(b) Whenever J I and g is a real-valued function defined on a subset of XJ , then g dJ = gJ d
if either integral is defined in [, ].
(c) Whenever f is a -integrable real-valued function, we can find a countable set J X and a J -
integrable function g such that f extends gJ .
proof (a)(i) Write SRfor the set of functions f satisfying the assertion, that is, such that for every > 0 there
is a g S such that R |f g| . ThenR f1 + f2 and cf1 S whenever f1 , f2 S. R PP Given > 0 there are

g1 , g2 S such that |f1 g1 | 2+|c| , |f2 g2 | 2 ; now g1 + g2 , cg1 S and |(f1 + f2 ) (g1 + g2 )| ,
R
|cf1 cg1 | . Q
Q Also, of course, f S whenever f0 S and f =a.e. f0 .

(ii) Write W for {W : W X, W S}, and C for the family of measurable cylinders in X. Then it
is plain from the definition in 254A that C C 0 C for all C, C 0 C, and of course C W for every C C,
because C
S S. Next, W \ V W whenever W , V W and V W , because then (W \SV ) = W V .
Thirdly, nN Wn W for every non-decreasing sequence hWn inN in W. P P SetR W = nN Wn . Given
R > 0, there is an n N such that (W \ Wn ) 2 . Now there is a g S such that |Wn g| 2 , so that
|W g| . Q Q Thus W is a Dynkin class of subsets of X.
By the Monotone Class Theorem (136B), W must include the -algebra of subsets of X generated by
N
C, which is c iI i . But this means that W contains every measurable subset of X, since by 254Ff any
N
measurable set differs by a negligible set from some member of c i . iI

(iii) Thus S contains the characteristic function of any measurable subset of X. Because it is closed
under addition and scalar multiplication, it contains all simple functions. But this means that it must
contain all integrable functions. PP If f is a real-valued
R function which is integrable over X, and > 0,
there
R is a simple function
R h : X R such that |f h| 2 (242M), and now there is a g S such that

|h g| 2 , so that |f g| . Q
Q
This proves part (a) of the proposition.
(b) Put 254Oa and 235L together.
(c) By 254Pb, there are a countable J I and a real-valued function g defined on a conegligible subset
of XJ such that f extends gJ . Now dom(gJ ) = J1 [dom g] is conegligible, so f =a.e. gJ and gJ is
-integrable. By (b), g is J -integrable.

254R Conditional expectations again Putting the ideas of 253H together with the work above, we
obtain some results which are important not only for their direct applications but for the light they throw
on the structures here.
256 Product measures 254R

Theorem Let h(Xi , i , i )iiI be a family of probability spaces with product (X, , ). For J I let J
be the -subalgebra of sets determined by coordinates in J (254Mb). Then we may regard L0 (J ) as
a subspace of L0 () (242Jh). Let PJ : L1 () L1 (J ) L1 () be the corresponding conditional
expectation operator (242J). Then
(a) for any J, K I, PKJ = PK PJ ;
(b) for any u L1 (), there is a countable set J I such that PJ u = u iff J J ;
(c) for any u L0 (), there is a unique smallest set J I such that u L0 (J ), and this J is
countable;
(d) for any W there is a unique smallest set J I such that W 4W 0 is negligible for some W 0 J ,
and this J is countable;
(e) for any -measurable real-valued function f : X R there is a unique smallest set J I such that
f is equal almost everywhere to a J -measurable function, and this J is countable.
Q
proof For J I, write XJ = iJ Xi , let J be the product measure on XJ , and set J (x) = xJ for
x X. Write L0J for L0 (J ), regarded as a subset of L0 = L0I , and L1J for L1 (J ) = L1 () L0J , as in
242Jb; thus L1J is the set of values of the projection PJ .
Q
(a)(i) Let C X be a measurable cylinder, expressed as iI Ci where Ci i for every i and
L = {i : Ci 6= Xi } is finite. Set
Q Q
Ci0 = Ci for i J, Xi for i I \ J, C 0 = iI Ci0 , = iI\J i Ci .
Then C 0 is a conditional expectation of C on J . P
P By 254N, we can identify with the product of J
and I\J . This identifies J with {E XI\J : E dom J }. By 253H we have a conditional expectation g
of C defined by setting
R
g(y, z) = C(y, t)I\J (dt)
Q
for y XJ , z XI\J . But C is identified with CJ CI\J , where CJ = iJ Ci , so that g(y, z) = 0 if
y/ CJ and otherwise is I\J CI\J = . Thus g = (CJ XI\J ). But the identification between XI XI\J
and X matches CJ XI\J with C 0 , as described above. So g becomes identified with C 0 and C 0 is a
conditional expectation of C. QQ
(ii) Next, setting
Q
Ci00 = Ci0 for i K, Xi for i I \ K, C 00 = iI Ci00 ,
Q Q
= iI\K i Ci0 = iI\(JK) i Ci ,
the same arguments show that C 00 is a conditional expectation of C 0 on K . So we have
PK PJ (C) = (C 00 ) .
Q
But if we look at , this is just iI\(KJ) i Ci , while Ci00 = Ci if i K J, Xi for other i. So C 00 is
a conditional expectation of C on KJ , and
PK PJ (C) = PKJ (C) .

(iii) Thus we see that the operators PK PJ , PKJ agree on elements of the form C where C is a
measurable cylinder. Because they are both linear, they agree on linear combinations of these, that is,
PK PJ v = PKJ v whenever v = g for some g in the space S of 254Q. But Rif u L1 () and > 0, there
is a -integrable function f such that f = u an there is a g S such that |f g| (254Qa), so that
ku vk1 , where v = g . Since PJ , PK and PKJ are all linear operators of norm 1,
kPK PJ u PKJ uk1 2ku vk1 + kPK PJ v PKJ vk1 2.
As is arbitrary, PK PJ u = PKJ u; as u is arbitrary, PK PJ = PKJ .
(b) Take u L1 (). Let J be the family of all subsets J of I such that PJ u = u. By (a), J K J for
all J, K J . Next, J contains a countable set J0 . PP Let f be a -integrable function such that f = u.
By 254Qc, we can find a countable set J0 I and a J0 -integrable function g such that f =a.e. gJ0 . Now
gJ0 is J0 -measurable and u = (gJ0 ) belongs to L1J0 , so J0 J . Q
Q
254S Infinite products 257

T
Write J = J , so that J J0 is countable. Then J J . P P Let > 0. As in the proof of
(a) above, there is a g S such that ku vk1 , where v = g . But because g is a finite linear
combination of characteristic functions of measurable cylinders, each determined by coordinates in some
finite set, there is a finite K I such that g isTK -measurable, so that PK v = v. Because K is finite, there
must be J1 , . . . , Jn J such that J K = 1in Ji K; but as J is closed under finite intersections,
J = J1 . . . Jn J , and J K = J K.
Now we have
PJ v = PJ PK v = PJ K v = PJK v = PJ PK v = PJ v,
using (a) twice. Because both PJ and PJ have norm 1,

kPJ u uk1 kPJ u PJ vk1 + kPJ v PJ vk1 + kPJ v PJ uk1 + kPJ u uk1
ku vk1 + 0 + ku vk1 + 0 2.
As is arbitrary, PJ u = u and J J . Q
Q
Now, for any J I,

PJ u = u = J J = J J
= PJ u = PJ PJ u = PJJ u = PJ u = u.
Thus J has the required properties.
(c) Set e = (X) , un = (ne)(une) for each n N. Then, for any J I, u L0J iff un L0J for every
n. PP () If u L0J , then u is expressible as f for some J -measurable f ; now fn = (nX) (f nX)
is J -measurable, so un = fn L0J for every n. () If un L0J for each n, then for each n we can find a
J -measurable function fn such that fn = un . But there is also a -measurable function f such that u = f ,
and we must have fn =a.e. (nX)(f nX) for each n, so that f =a.e. limn fn and u = (limn fn ) .
Since limn fn is J -measurable, u L0J . Q Q
As every un belongs to L1 , we know that
un L0J un L1J PJ un = un .
that PJ un = un iff J Jn . So we see that u L0J iff J Jn
By (b), there is for each nSa countable Jn suchS
for every n, that is, J nN Jn . Thus J = nN Jn has the property claimed.

(d) Applying (c) to u = (W ) , we have a (countable) unique smallest J such that u L0J . But if
J I, then there is a W 0 J such that W 0 4W is negligible iff u L0J . So this is the J we are looking
for.
(e) Again apply (c), this time to f .

254S Proposition Let h(Xi , i , i )iiI be a family of probability spaces, with product (X, , ).
(a) If A X is determined by coordinates in I \ {j} for every j I, then its outer measure A must be
either 0 or 1.
(b) If W and W > 0, then for every > 0 there are a W 0 and a finite set J I such that
W 0 1 and for every x W 0 there is a y W such that xI \ J = yI \ J.
Q
proof For J I write XJ for iJ Xi and J for the product measure on XJ .
(a) Let W be a measurable envelope of A. By 254Rd, there is a smallest J I for which there is a
W 0 , determined by coordinates in J, with (W 4W 0 ) = 0. Now J = . P P Take any j I. Then A is
determined by coordinates in I \ {j}, that is, can be regarded as Xj A0 for some A0 XI\{j} . We can
also think of as the product of {j} and I\{j} (254N). Let I\{j} be the domain of I\{j} . By 251R,
A = {j} Xj I\{j} A0 = I\{j} A0 .
Let V I\{j} be measurable envelope of A0 . Then W 0 = Xj V belongs to , includes A and has
measure A, so (W W 0 ) = W = W 0 and W 4W 0 is negligible. At the same time, W 0 is determined
by coordinates in I \ {j}. This means that J must be included in I \ {j}. As j is arbitrary, J = . Q
Q
258 Product measures 254S

But the only subsets of X which are determined by coordinates in are X and . Since W differs from
one of these by a negligible set, A = W {0, 1}, as claimed.
(b) Set = 21 min(, 1)W . By 254Fe, there is a measurable set V , determined by coordinates in a finite
subset J of I, such that (W 4V ) . Note that
1
V W W > 0,
2
so
1
(W 4V ) W V .
2

We may identify with the c.l.d. product of J and I\J (254N). Let W , V XI XI\J be the sets
corresponding to W , V X. Then V can be expressed as U XI\J where J U = V > 0. Set U 0 = {z :
z XI\J , J W 1 [{z}] = 0}. Then U 0 is measured by I\J (252D(ii) again, because both J and I\J are
complete), and

Z
J U I\J U 0 J (W 1 [{z}]4U )I\J (dz)

(because if z U 0 then J (W 1 [{z}]4U ) = J U )


Z
= J (W 4V )1 [{z}]I\J (dz)

= (J I\J )(W 4V )
(252D once more)
= (W 4V ) V = J U.

This means that I\J U 0 . Set W 0 = {x : x X, xI \ J / U 0 }; then W 0 1 . If x W 0 , then


/ U , so W [{z}] is not empty, that is, there is a y W such that yI \ J = z. So this W 0 has
z = xI \ J 0 1

the required properties.

254T Remarks It is important to understand that the results above apply to L0 and L1 and measurable-
sets-up-to-a-negligible set, not to sets and functions themselves. One idea does apply to sets and functions,
whether measurable or not.

(a) Let hXi iiI be a family of sets with Cartesian product X. For each J I let WJ be the set of
subsets of X determined by coordinates in J. Then WJ WK = WJK for all J, K I. P P Of course
WJ WK WJK , because WJ WJ 0 whenever J 0 J. On the other hand, suppose W WJ WK ,
x W , y X and xJ K = yJ K. Set z(i) = x(i) for i J, y(i) for i I \J. Then zJ = xJ so z W .
Also yK = zK so y W . As x, y are arbitrary, W WJK ; as W is arbitrary, WJ WK WJK . Q Q
Accordingly, for any W X, F = {J : W WJ } is a filter on I (unless W = X or W = , in which case
F = PX). But F does not necessarily have a least element, as the following example shows.

(b) Set X = {0, 1}N ,


W = {x : x X, limi x(i) = 0}.
Then for everyTn N W is determined by coordinates in Jn = {i : i n}. But W is not determined by
coordinates in nN Jn = . Note that
S T
W = nN in {x : x(i) = 0}
is measurable for the usual measure on X. But it is also negligible (since it is countable); in 254Rd we have
J = , W 0 = .

*254U I am now in a position to describe a counter-example answering a natural question arising out
of 251.
254X Infinite products 259

Example There are a localizable measure space (X, , ) and a probability space (Y, T, ) such that the
c.l.d. product measure on X Y is not localizable.
proof (a) Take (X, , ) to be the space of 216E, so that X = {0, 1}I , where I = PC for some set C of
cardinal greater than c. For each C write E for {x : x X, x({}) = 1} (that is, G{} in the notation
of 216Ec); then E and E = 1; also every measurable set of non-zero measure meets some E in a
set of non-zero measure, while E E is negligible for all distinct , (see 216Ee).
Let (Y, T, ) be {0, 1}C with the usual measure (254J). For C, let F be {y : y Y, y() = 1}, so
that F = 21 . Let be the c.l.d. product measure on X Y , and its domain.
(b) Consider the family W = {E F : C} . ?? Suppose, if possible, that V were an essential
supremum of W in in the sense of 211G. For C write H = {x : V [{x}]4F is negligible}. Because
F 4F is non-negligible, H H = for all 6= .
Now E \ H is -negligible for every C. P P ((E F ) \ V ) = 0, so F \ V [{x}] is negligible for
almost every x E , by 252D. On the other hand, if we set F0 = Y \ F , W = (X Y ) \ (E F0 ), then
we see that
(E F0 ) (E F ) = , E F W ,

((E F ) \ W ) = ((E F0 ) (E F )) (E E ) = 0
for every 6= , so W is an essential upper bound for W and V (E F0 ) = V \ W must be -negligible.
Accordingly V [{x}] \ F = V [{x}] F0 is -negligible for -almost every x E . But this means that
V [{x}]4F is -negligible for -almost every x E , that is, (E \ H ) = 0. QQ
Now consider the family hE H iC . This is a disjoint family of sets of finite measure in X. If E
has non-zero measure, there is a C such that (E H E) = (E E) > 0. But this means that
E = {E H : C} satisfies the conditions of 213O, and must be strictly localizable; which it isnt. X
X
(c) Thus we have found a family W with no essential supremum in , and is not localizable.
Remark If (X, , ) and (Y, T, ) are any localizable measure spaces with a non-localizable c.l.d. product
measure, then their c.l.d. versions are still localizable (213Hb) and still have a non-localizable product (251S),
which cannot be strictly localizable; so that one of the factors is also not strictly localizable (251N). Thus
any example of the type here must involve a complete locally determined localizable space which is not
strictly localizable, as in 216E.

254V Corresponding to 251T and 251Wo, we have the following result on countable powers of atomless
probability spaces.
Proposition Let (X, , ) be an atomless probability space and I a countable set. Let be the product
probability measure on X I . Then {x : x X I , x is injective} is -conegligible.
proof For any pair {i, j} of distinct elements of X, the set {z : z X {i,j} , z(i) = z(j)} is negligible for
the product measure on X {i,j} , by 251T. By 254Oa, {x : x X, x(i) = x(j)} is -negligible. Because I
is countable, there are only countably many such pairs {i, j}, so {x : x X, x(i) = x(j) for some distinct
i, j I} is negligible, and its complement is conegligible; but this complement is just the set of injective
functions from I to X.

254X Basic exercises (a) Let h(Xi , i , i )iiI be any family of probability spaces, with product
(X, , ). Write E for the family of subsets of X expressible as the union of a finite disjoint family of
measurable cylinders. (i) Show that if C X is a measurable cylinder then X \ C E. (ii) Show that
W V E for all W , V E. (iii) Show that X \ W E for every W E. (iv) Show that E is an algebra of
subsets of X. (v) Show that for any W , > 0 there is a V E such that (W 4V ) 2 . (vi) Show that
for any W , > 0 there
S are disjoint measurable cylinders C0 , . . . , Cn such that (W Cj ) (1 )Cj
for every j and (W \ jn Cj ) . (Hint: select the Cj from the measurable cylinders composing a set V
R R
as in (v).) (vii) Show that if f , g are -integrable
R R functions and C f C g for every measurable cylinder
C X, then f a.e. g. (Hint: show that W f W f for every W .)
260 Product measures 254Xb

> (b) Let h(Xi , i , i ) be a family of probability spaces, with product (X, , ). Show that the outer
measure defined by is exactly the outer measure described in 254A, that is, that is a regular outer
measure.
(c) Let h(Xi , i , i ) be a family of probability spaces, with product (X, , ). Write 0 for the restriction
N
of to c iI i , and C for the family of measurable cylinders in X. Suppose that (Y, T, ) is a probability
space and : Y X a function. (i) Show that is inverse-measure-preserving when regarded as a function
N
from (Y, T, ) to (X, c iI i , 0 ) iff 1 [C] belongs to T and 1 [C] = 0 C for every C C. (ii) Show
that 0 is the only measure on X with this property. (Hint: 136C.)
> (d) Let I be a set and (Y, T, ) a complete probability space. Show that a function : Y {0, 1}I is
inverse-measure-preserving for and the usual measure on {0, 1}I iff {y : (y)(i) = 1 for every i J} =
2#(J) for every finite J I.
> (e) Let I be any set and the usual measure on X = {0, 1}I . Define addition on X by setting
(x + y)(i) = x(i) +2 y(i) for every i I, x, y X, where 0 +2 0 = 1 +2 1 = 0, 0 +2 1 = 1 +2 0 = 1. (i)
Show that for any y X, the map x 7 x + y : X X is inverse-measure-preserving. (Hint: Use 254G.)
(ii) Show that the map (x, y) 7 x + y : X X X is inverse-measure-preserving, if X X is given its
product measure.
> (f ) Let I be any set and the usual measure on PI. (i) Show that the map a 7 a4b : PI PI is
inverse-measure-preserving for any b I; in particular, a 7 I \ a is inverse-measure-preserving. (ii) Show
that the map (a, b) 7 a4b : PI PI PI is inverse-measure-preserving.
> (g) Show that for any q [0, 1] and any set I there is a measure on PI such that {a : J a} = q #(J)
for every finite J I.
> (h) Let (Y, T, ) be a complete probability space, and write for Lebesgue measure on [0, 1]. Suppose
that : Y [0, 1] is a function such that 1 [I] exists and is equal to I for every interval I of the form
[2n k, 2n (k + 1)], where n N and 0 k < 2n . Show that is inverse-measure-preserving for and .
(i) Let hXi iiI be a family of sets, and for each i I let i be a -algebra of subsets of Xi . Show
N
that for every E c iI i there is a countable set J I such that E is expressible as J1 [F ] for some
N Q Q
F c Xj , writing J (x) = xJ
iJ Xi for x
iJ Xi .
iI
N
(j) (i) Let be the usual measure on X = {0, 1} . Show that for any k 1, (X, ) is isomorphic to
(X k , k ), where k is the measure on X k which is the product measure obtained by giving each factor X the
measure . (ii) Writing [0,1] for Lebesgue measure on [0, 1], etc., show that for any k 1, ([0, 1]k , [0,1]k )
is isomorphic to ([0, 1], [0,1] ).
(k) (i) Writing [0,1] for Lebesgue measure on [0, 1], etc., show that ([0, 1], [0,1] ) is isomorphic to
k
([0, 1[ , [0,1[ ). (ii) Show that for any k 1, ([0, 1[ , [0,1[k ) is isomorphic to ([0, 1[ , [0,1[ ). (iii) Show
that for any k 1, (R, R ) is isomorphic to (R k , Rk ).
(l) Let be Lebesgue measure on [0, 1] and the product measure on [0, 1]N . Show that ([0, 1], ) and
([0, 1]N , ) are isomorphic.
Q
(m) Let h(Xi , i , i )iiI be a family of complete probability spaces
Qand the product measure
Q on iI Xi ,
with domain . Suppose that Ai Xi for each i I. Show that iI Ai iff either (i) iI i Ai = 0
or (ii) Ai i for every i and {i : Ai 6= Xi } is countable. (Hint: assemble ideas from 252Xb, 254F, 254L
and 254N.)
(n) Let h(Xi , i , i )iiI be a family of probability spaces with product (X, , ). (i) Show that, for any
A X,
A = min{J J [A] : J I is countable},
Q
where for J I I write J for the product probability measure on XJ = iJ Xi and J : X XJ for the
canonical map. (ii) Show that if J, K I are disjoint and A, B X are determined by coordinates in J,
K respectively, then (A B) = A B.
254 Notes Infinite products 261

(o) Let h(Xi , i , i )iiI be a family of probability spaces with product (X, , ). Let S be the linear span
of the set of characteristic functions of measurable cylinders in X, as in 254Q. Show that {f : f S} is
dense in Lp () for every p [1, [.

(p) Let h(Xi , i , i )iiI be a family of probability spaces, and (X, , ) their product; for J I let J
be the -algebra of members of determined by coordinates in J and PJ : L1 = L1 () L1J = L1 (J )
the corresponding conditional expectation. (i) Show that if u L1J and v L1I\J then u v L1 and
R R R R R
u v = u v. (Hint: 253D.) (ii) Show that if u L1 then u L1J iff C u = C u for every
measurable cylinder C X which isTdetermined by coordinates
T in I \ J. (Hint: 254Xa(vii).) (iii) Show that
if J PI is non-empty, with J = J , then L1J = JJ L1J .

(q) (i) Let I be any set and the usual measure on PI. Let A PI be such that a4b A whenever a A
and b I is finite. Show that A must be either 0 or 1. (ii) Let be the usual measure on {0, 1}N , and
its domain. Let f : {0, 1}N R be a function such that, for x, y {0, 1}N , f (x) = f (y) {n : n N,
x(n) 6= y(n)} is finite. Show that f is not -measurable. (Hint: for any q Q, {x : f (x) q} is either 0
or 1.)
Q
(r) Let hXi iiI be any family of sets and A B iI Xi . Suppose that A is determined by coordinates
in J I and that B is determined by coordinates in K. Show that there is a set C such that A C B
and C is determined by coordinates in J K.

254Y Further exercises (a) Let Q h(Xi , i , i )iiI be a family of probability spaces, and for J I let
J be the product measure on XJ = iJ Xi ; write X = XI , = I and J (x) = xJ for x X and
J I.
(i) Show that for K J I we have a natural linear, order-preserving and norm-preserving map
TJK : L1 (K ) L1 (J ) defined by writing TJK (f ) = (f KJ ) for every K -integrable function f , where
KJ (y) = yK for y XJ .
(ii) Write K for the set of finite subsets of I. Show that if W is any Banach space and hTK iKK is a
family such that () TK is a bounded linear operator from L1 (K ) to W for every K K () TK = TJ TJK
whenever K J K () supKK kTK k < , then there is a unique bounded linear operator T : L1 () W
such that TK = T TIK for every K K. S
(iii) Write J for the set of countable subsets of I. Show that L1 () = JJ TIJ [L1 (J )].
Q
(b) Let h(Xi , i , i )iiI be a family of probability spaces, and a complete measure on X = iI Xi .
Suppose that for every complete probability space (Y, T, ) and function : Y X, is inverse-measure-
preserving for and iff 1 [C] is defined and equal to 0 C for every measurable cylinder C X, writing
0 for the functional of 254A. Show that is the product measure on X.

(c) Let I be a set, and the usual measure on {0, 1}I . Show that L1 () is separable, in its norm topology,
iff I is countable.

(d) Let I be a set, and the usual measure on PI. Show that if F is a non-principal ultrafilter on I then
F = 1. (Hint: 254Xq, 254Xf.)

(e) Let (X, , ), (Y, T, ) and be as in 254U. Set A = {f : C} as defined in 216E. Let A be the
subspace measure on A, and the c.l.d. product measure of A and on A Y . Show that is a proper
extension of the subspace measure AY . (Hint: consider W = {(f , y) : C, y F }, in the notation of
254U.)

(f ) Let (X, , ) be an atomless probability space, I a set with cardinal at most #(X), and A the set of
injective functions from I to X. Show that A has full outer measure for the product measure on X I .

254 Notes and comments While there are many reasons for studying infinite products of probability
spaces, one stands pre-eminent, from the point of view of abstract measure theory: they provide constructions
of essentially new kinds of measure space. I cannot describe the nature of this newness effectively without
262 Product measures 254 Notes

venturing into the territory of Volume 3. But the function spaces of Chapter 24 do give at least a form of
words we can use: these are the first probability spaces (X, , ) we have seen for which L1 () need not be
separable for its norm topology (254Yc).
The formulae of 254A, like those of 251A, lead very naturally to measures; the point at which they become
more than a curiosity is when we find that the product measure is a probability measure (254Fa), which
must be regarded as the crucial argument of this section, just as 251E is the essential basis of 251. It
is I think remarkable that it makes no difference to the result here whether I is finite, countably infinite
or uncountable. If you write out the proof for the case I = N, it will seem natural to expand the sets
Jn until they are initial segments of I itself, thereby avoiding altogether the auxiliary set K; but this is a
misleading simplification, because it hides an essential feature of the argument, which is that any sequence
in C involves only countably many coordinates, so that as long as we are dealing with only one such sequence
the uncountability of the whole set I is irrelevant. This general principle naturally permeates the whole of
the section; in 254O I have tried to spell out the way in which many of the questions we are interested in
can be expressed in terms of countable subproducts of the factor spaces Xi . See also the exercises 254Xi,
254Xm and 254Ya(iii).
There is a slightly paradoxical side to this principle: even the best-behaved subsets Ei of Xi may fail
Q I
to have measurable products iI Ei if Ei 6= Xi for uncountably many i. For instance, ]0, 1[ is not a
measurable subset of [0, 1]I if I is uncountable (254Xm). It has full outer measure and its own product
measure is just the subspace measure (254L), but any measurable subset must have measure zero. The
N
point is that the empty set is the only member of c iI i , where i is the algebra of Lebesgue measurable
I
subsets of [0, 1] for each i, which is included in ]0, 1[ (see 254Xi).
As in 251, I use a construction which automatically produces a complete measure on the product space.
I am sure that this is the best choice for the product measure. But there are occasions when its restriction
to the -algebra generated by the measurable cylinders is worth looking at; see 254Xc.
Lemma 254G is a result of a type which will be commoner in Volume 3 than in the present volume.
It describes the product measure in terms not of what it is but of what it does; specifically, in terms
of a property of the associated family of inverse-measure-preserving functions. It is therefore a universal
mapping theorem. (Compare 253F.) Because this description is sufficient to determine the product measure
completely (254Yb), it is not surprising that I use it repeatedly.
The usual measure on {0, 1}I (254J) is sometimes called coin-tossing measure because it can be used
to model the concept of tossing a coin arbitrarily many times indexed by the set I, taking an x {0, 1}I
to represent the outcome in which the coin is heads for just those i I for which x(i) = 1. The sets,
or events, in the class C are just those which can be specified by declaring the outcomes of finitely many
tosses, and the probability of any particular sequence of n results is 1/2n , regardless of which tosses we look
at or in which order. In Chapter 27 I will return to the use of product measures to represent probabilities
involving independent events.
In 254K I come to the first case in this treatise of a non-trivial isomorphism between two measure spaces. If
you have been brought up on a conventional diet of modern abstract pure mathematics based on algebra and
topology, you may already have been struck by the absence of emphasis on any concept of homomorphism
or isomorphism. Here indeed I start to speak of isomorphisms between measure spaces without even
troubling to define them; I hope it really is obvious that an isomorphism between measure spaces (X, , )
and (Y, T, ) is a bijection : X Y such that T = {F : F Y, 1 [F ] } and F = 1 [F ] for every
F T, so that is necessarily {E : E X, [E] T} and E = [E] for every E . Put like this, you
may, if you worked through the exercises of Volume 1, be reminded of some constructions of -algebras in
111Xc-111Xd and of the image measures in 112E. The result in 254K (see also 134Yo) naturally leads to
two distinct notions of homomorphism between two measure spaces (X, , ) and (Y, T, ):
(i) a function : X Y such that 1 [F ] and 1 [F ] = F for every F T,
(ii) a function : X Y such that [E] T and [E] = E for every E .
On either definition, we find that a bijection : X Y is an isomorphism iff and 1 are both
homomorphisms. (Also, of course, the composition of homomorphisms will be a homomorphism.) My
own view is that (i) is the more important, and in this treatise I study such functions at length, calling
them inverse-measure-preserving. But both have their uses. The function of 254K not only satisfies
both definitions, but is also nearly an isomorphism in several different ways, of which possibly the most
255A Convolutions of functions 263

important is that there are conegligible sets X 0 {0, 1}N , Y 0 [0, 1] such that X 0 is an isomorphism
between X 0 and Y 0 when both are given their subspace measures.
Having once established the isomorphism between [0, 1] and {0, 1}N , we are led immediately to many more;
see 254Xj-254Xl. In fact Lebesgue measure on [0, 1] is isomorphic to a large proportion of the probability
spaces arising in applications. In Volumes 3 and 4 I will discuss these isomorphisms at length.
The general notion of subproduct is associated with some of the deepest and most characteristic results
in the theory of product measures. Because we are looking at products of arbitrary families of probability
spaces, the definition must ignore any possible structure in the index set I of 254A-254C. But many appli-
cations, naturally enough, deal with index sets with favoured subsets or partitions, and the first essential
step is the associative law (254N; compare 251Xd-251Xe). This is, for instance, the toolQ by which Qwe can
apply Fubinis theorem within infinite products. The natural projection maps from iI Xi to iJ Xi ,
where J I,Qare related in a way which has already been used as the basis Q of theorems in 235; the product
measure on iJ Xi is precisely the image of the product measure on iI Xi (254Oa). In 254O-254Q I
explore the consequences of this fact and the fact already noted that all measurable sets in the product are
essentially determined by coordinates in some countable set.Q
In 254R I go more deeply into this notion of a set W iI Xi determined by coordinates in a set
J I. In its primitive form this is a purely set-theoretic notion (254M, 254Ta). I think that even a three-
element set I can give us surprises; I invite you to try to visualize subsets of [0, 1]3 which are determined
by pairs of coordinates. But the interactions of this with measure-theoretic ideas, and in particular with a
willingness to add or discard negligible sets, lead to much more, and in particular to the unique minimal sets
of coordinates associated with measurable sets and functions (254R). Of course these results can be elegantly
and effectively described in terms of L1 and L0 spaces, in which negligible sets are swept out of sight as the
spaces are constructed. The basis of all this is the fact that the conditional expectation operators associated
with subproducts multiply together in the simplest possible way (254Ra); T but some further idea is needed
to show that if J is a non-empty family of subsets of I, then L0T J = JJ L0J (see part (b) of the proof of
254R, and 254Xp(iii)).
254Sa is a version of the zero-one law (272O below). 254Sb is a strong version of the principle that
measurable sets in a product must be approximable by sets determined by a finite set of coordinates (254Fe,
254Qa, 254Xa). Evidently it is not a coincidence that the set W of 254Tb is negligible. In 272 I will revisit
many of the ideas of 254R-254S and 254Xp, in particular, in the more general context of independent
-algebras.
Finally, 254U and 254Ye hardly belong to this section at all; they are unfinished business from 251.
They are here because the construction of 254A-254C is the simplest way to produce an adequately complex
probability space (Y, T, ).

255 Convolutions of functions


I devote a section to a construction which is of great importance and will in particular be very useful
in Chapters 27 and 28 and may also be regarded as a series of exercises on the work so far.
I find it difficult to know how much repetition to indulge in in this section, because the natural unified
expression of the ideas is in the theory of topological groups, and I do not think we are yet ready for the
general theory (I will come to it in Chapter 44 in Volume 4). The groups we need for this volume are
R;
R r , for r 2;
S 1 = {z : z C, |z| = 1}, the circle group;
Z, the group of integers.
All the ideas already appear in the theory of convolutions on R, and I will therefore present this material
in relatively detailed form, before sketching the forms appropriate to the groups R r and S 1 (or ], ]); Z
can I think be safely left to the exercises.

255A This being a book on measure theory, it is perhaps appropriate for me to emphasize, as the basis
of the theory of convolutions, certain measure space isomorphisms.
264 Product measures 255A

Theorem Let be Lebesgue measure on R and 2 Lebesgue measure on R 2 ; write , 2 for their domains.
(a) For any a R, the map x 7 a + x : R R is a measure space automorphism of (R, , ).
(b) The map x 7 x : R R is a measure space automorphism of (R, , ).
(c) For any a R, the map x 7 a x : R R is a measure space automorphism of (R, , ).
(d) The map (x, y) 7 (x + y, y) : R 2 R 2 is a measure space automorphism of (R 2 , 2 , 2 ).
(e) The map (x, y) 7 (x y, y) : R 2 R 2 is a measure space automorphism of (R 2 , 2 , 2 ).
Remark I ought to remark that (b), (d) and (e) may be regarded as simple special cases of Theorem 263A
in the next chapter. I nevertheless feel that it is worth writing out separate proofs here, partly because the
general case of linear operators dealt with in 263A requires some extra machinery not needed here, but more
because the result here has nothing to do with the linear structure of R and R 2 ; it is exclusively dependent
on the group structure of R, together with the links between its topology and measure, and the arguments
I give now are adaptable to the proper generalizations to abelian topological groups.
proof (a) This is just the translation-invariance of Lebesgue measure, dealt with in 134. There I showed
that if E then E + a and (E + a) = E (134Ab); that is, writing (x) = x + a, ([E]) exists
and is equal to E for every E . But of course we also have
(1 [E]) = (E + (a)) = E
for every E , so is an automorphism.
(b) The point is that (A) = (A) for every A R. P P (I follow the
Pdefinitions of Volume 1.)
0, there is a sequence hIn inN of half-open intervals covering A with n=0 In A + . Now
If > S
A nN (In ). But if In = [an , bn [ then In = ]bn , an ], so
P P P
(A) n=0 (In ) = n=0 max(0, an (bn )) = n=0 In A + .
As is arbitrary, (A) A. Also of course A ((A)) = A, so (A) = A. Q
Q
This means that, setting (x) = x this time, is an automorphism of the structure (R, ). But since
is defined from by the abstract procedure of Caratheodorys method, must also be an automorphism
of the structure (R, , ).
(c) Put (a) and (b) together; x 7 a x is the composition of the automorphisms x 7 x and x 7 a + x,
and the composition of automorphisms is surely an automorphism.
(d)(i) Write T for the set {E : E 2 , [E] 2 }, where this time (x, y) = (x + y, y) for x, y R,
so that : R 2 R2 is a bijection. Then T is a -algebra, being the intersection of the -algebras 2 and
{E : [E] 2 } = {1 [F ] : F 2 }. Moreover, 2 E = 2 ([E]) for every E T. P
P By 252D, we have
R
2 E = {x : (x, y) E}(dy).
But applying the same result to [E] we have

Z Z
2 [E] = {x : (x, y) [E]}(dy) = {x : (x y, y) E}(dy)
Z Z
= (E 1 [{y}] + y)(dy) = E 1 [{y}](dy)

(because Lebesgue measure is translation-invariant)


= 2 E. Q
Q

(ii) Now and 1 are clearly continuous, so that [G] is open, and therefore measurable, for every
open G; consequently all open sets must belong to T. Because T is a -algebra, it contains all Borel
sets. Now let E be any measurable set. Then there are Borel sets H1 , H2 such that H1 E H2 and
2 (H2 \ H1 ) = 0 (134Fb). We have [H1 ] [E] [H2 ] and
([H2 ] \ [H1 ]) = [H2 \ H1 ] = (H2 \ H1 ) = 0.
Thus [E] \ [H1 ] must be negligible, therefore measurable, and [E] = [H1 ] ([E] \ [H1 ]) is measurable.
255D Convolutions of functions 265

This shows that [E] is measurable whenever E is. But now observe that T can also be expressed as
{E : E 2 , 1 [E] 2 }, so that we can apply the same argument with 1 in the place of to see that
1 [E] is measurable whenever E is. So is an automorphism of the structure (R 2 , 2 ), and therefore (by
(i) again) of (R 2 , 2 , 2 ).

(e) Of course this is an immediate corollary either of the proof of (d) or of (d) itself as stated, since
(x, y) 7 (x y, y) is just the inverse of (x, y) 7 (x + y, y).

255B Corollary (a) If a R, then for any complex-valued function f defined on a subset of R
R R R R
f (x)dx = f (a + x)dx = f (x)dx = f (a x)dx
in the sense that if one of the integrals exists so do the others, and they are then all equal.
(b) If f is a complex-valued function defined on a subset of R2 , then
R R R
f (x + y, y)d(x, y) = f (x y, y)d(x, y) = f (x, y)d(x, y)
in the sense that if one of the integrals exists and is finite so does the other, and they are then equal.

255C Remarks (a) I am not sure whether it ought to be obvious that if (X, , ), (Y, T, ) are
measure spaces and : X Y is an isomorphism, then for any function f defined on a subset of Y
R R
f ((x))(dx) = f (y)(dy)
in the sense that if one is defined so is the other, and they are then equal. If it is obvious then the obviousness
must be contingent on the nature of the definition of integration: integrability with respect to the measure
is something which depends on the structure (X, , ) and on no other properties of X. If it is not obvious
then it is an easy deduction from Theorem 235A above, applied in turn to and 1 and to the real and
imaginary parts of f . In any case the isomorphisms of 255A are just those needed to prove 255B.
R
(b) Note that in 255Bb I write f (x, y)d(x, y) to emphasize that I am considering the integral of f with
respect to two-dimensional Lebesgue measure. The fact that
R R R R R R
f (x, y)dx dy = f (x + y, y)dx dy = f (x y, y)dx dy
R R
is actually easier, being an immediate consequence of the equality f (a+x)dx = f (x)dx. But applications
of this result often depend essentially on the fact that the functions (x, y) 7 f (x + y, y), (x, y) 7 f (x y, y)
are measurable as functions of two variables.

(c) I have moved directly to complex-valued functions because these are necessary for the applications
in Chapter 28. If however they give you any discomfort, either technically or aesthetically, all the measure-
theoretic ideas of this section are already to be found in the real case, and you may wish at first to read it
as if only real numbers were involved.

255D A further corollary of 255A will be useful.

Corollary Let f be a complex-valued function defined on a subset of R.


(a) If f is measurable, then the functions (x, y) 7 f (x + y), (x, y) 7 f (x y) are measurable.
(b) If f is defined almost everywhere on R, then the functions (x, y) 7 f (x + y), (x, y) 7 f (x y) are
defined almost everywhere on R 2 .

proof Writing g1 (x, y) = f (x + y), g2 (x, y) = f (x y) whenever these are defined, we have
g(x, y) = (f 1)((x, y)), g2 (x, y) = (f 1)(1 (x, y)),
writing (x, y) = (x + y, y) as in 255B(d-e), and (f 1)(x, y) = f (x), following the notation of 253B. By
253C, f 1 is measurable if f is, and defined almost everywhere if f is. Because is a measure space
automorphism, (f 1) = g1 and (f 1)1 = g2 are measurable, or defined almost everywhere, if f is.
266 Product measures 255E

255E The basic formula Let f and g be measurable complex-valued functions defined almost every-
where in R. Write f g for the function defined by the formula
R
(f g)(x) = f (x y)g(y)dy
whenever the integral exists (with respect to Lebesgue measure, naturally) as a complex number. Then f g
is the convolution of the functions f and g.
Observe that dom(|f | |g|) = dom(f g), and that |f g| |f | |g| everywhere on their common domain,
for all f and g.
Remark Note that I am here prepared to contemplate the convolution of f and g for arbitrary members of
L0C , the space of almost-everywhere-defined measurable complex-valued functions, even though the domain
of f g may be empty.

255F Basic properties (a) Because integration is linear, we surely have


((f1 + f2 ) g)(x) = (f1 g)(x) + (f2 g)(x),

(f (g1 + g2 ))(x) = (f g1 )(x) + (f g2 )(x),

(cf g)(x) = (f cg)(x) = c(f g)(x)


whenever the right-hand sides of the formulae are defined.

(b) If f , g are measurable complex-valued functions defined almost everywhere in R, then f g = g f , in


the strict sense that they have the same domain and the same value at each point of that common domain.
P
P Take x R and apply 255Ba to see that
Z Z
(f g)(x) = f (x y)g(y)dy = f (x (x y))g(x y)dy
Z
= f (y)g(x y)dy = (g f )(x)

if either is defined. Q
Q

(c) If f1 , f2 , g1 , g2 are measurable complex-valued functions defined almost everywhere in R, and f1 =a.e.
f2 and g1 =a.e. g2 , then for every x R we shall have f1 (x y) = f2 (x y) for almost every y R, by
255Ac. Consequently f1 (x y)g1 (y) = f2 (x y)g2 (y) for almost every y, and (f1 g1 )(x) = (f2 g2 )(x) in
the sense that if one of these is defined so is the other, and they are then equal.
Accordingly we may regard convolution as a binary operator on L0C ; if u, v L0C , we can define u v
as being equal to f g whenever f = u and g = v. We need to remember, of course, that for general u,
v L0 the domain of u v may vanish.

255G I have grouped 255Fa-255Fc together because they depend only on ideas up to and including
255Ac, 255Ba. Using the second halves of 255A and 255B we get much deeper. I begin with what seems to
be the fundamental result.
Theorem Let f , g and h be measurable complex-valued functions defined almost everywhere in R.
(a)
R R
h(x)(f g)(x)dx = h(x + y)f (x)g(y)d(x, y)
whenever the right-hand side exists in C, provided that in the expression h(x)(f g)(x) we interpret the
product as 0 if h(x) = 0 and (f g)(x) is undefined. R
R (b) If, on the same interpretation of h(x)(|f | |g|)(x), the integral |h(x)|(|f | |g|)(x)dx is finite, then
h(x + y)f (x)g(y)d(x, y) exists in C, so again we shall have
Z
R
h(x)(f g)(x)dx = h(x + y)f (x)g(y)d(x, y)
ZZ ZZ
= h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx.
255I Convolutions of functions 267

proof Consider the functions


k1 (x, y) = h(x)f (x y)g(y), k2 (x, y) = h(x + y)f (x)g(y)
wherever these are defined. 255D tells us that k1 and k2 are measurable and defined almost everywhere.
Now setting (x, y) = (x + y, y), we have k2 = k1 , so that
R R
k1 (x, y)d(x, y) = k2 (x, y)d(x, y)
if either exists, by 255Bb.
If
R R
h(x + y)f (x)g(y)d(x, y) = k2
exists, then by Fubinis theorem we have
R R R R
k2 = k1 (x, y)d(x, y) = ( h(x)f (x y)g(y)dy)dx
R
so h(x)f (x y)g(y)dy exists almost everywhere, that is, (f g)(x) exists for almost every x such that
h(x) 6= 0; on the interpretation I am using here, h(x)(f g)(x) exists almost everywhere, and
Z Z Z Z

h(x)(f g)(x)dx = h(x)f (x y)g(y)dy dx = k1
Z Z
= k2 = h(x + y)f (x)g(y)d(x, y).

If (on the same interpretation) |h| (|f | |g|) is integrable,


|k1 (x, y)| = |h(x)||f (x y)||g(y)|
is measurable, and
RR R
|h(x)||f (x y)||g(y)|dydx = |h(x)|(|f | |g|)(x)dx
is finite, so by Tonellis theorem (252G, 252H) k1 and k2 are integrable, and once again
Z Z
h(x)(f g)(x)dx = h(x + y)f (x)g(y)d(x, y)
ZZ ZZ
= h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx.

255H Certain standard results are now easy.


Corollary If f , g are complex-valued functions which are integrable over R, then f g is integrable, with
R R R R R R
f g = f g, |f g| |f | |g|.

proof In 255G, set h(x) = 1 for every x R; then


R R R R
h(x + y)f (x)g(y)d(x, y) = f (x)g(y)d(x, y) = f g
by 253D, so
R R R R R
f g = h(x)(f g)(x)dx = h(x + y)f (x)g(y)d(x, y) = f g,
as claimed. Now
R R R R
|f g| |f | |g| = |f | |g|.

255I Corollary For any measurable complex-valued functions f , g defined almost everywhere in R,
f g is measurable and has measurable domain.
proof Set fn (x) = f (x) if x dom f , |x| n, |f (x)| n, and 0 elsewhere in R; define gn similarly
from g. Then fn and gn are integrable, |fn | |f | and |gn | |g| almost everywhere, and f = limn fn ,
g = limn gn . Consequently, by Lebesgues Dominated Convergence Theorem,
268 Product measures 255I

Z Z
(f g)(x) = f (x y)g(y)dy = lim fn (x y)gn (y)dy
n
Z
= lim fn (x y)gn (y)dy = lim (fn gn )(x)
n n

for every x dom f g. But fn gn is integrable, therefore measurable, for every n, so that f g must be
measurable.
As for the domain of f g,
Z
x dom(f g) f (x y)g(y)dy is defined in C
Z
|f (x y)||g(y)|dy is defined in R
Z
|fn (x y)||gn (y)|dy is defined in R for every n
Z
and sup |fn (x y)||gn (y)|dy < .
nN

Because every |fn | |gn | is integrable, therefore measurable and with measurable domain,
T
dom(f g) = {x : x nN dom(|fn | |gn |), supnN (|fn | |gn |)(x) < }
is measurable.

255J Theorem Let f , g, h be complex-valued measurable functions defined almost everywhere in R,


such that f g and g h are also defined a.e. Suppose that x R is such that one of (|f | (|g| |h|))(x),
((|f | |g|) |h|)(x) is defined in R. Then f (g h) and (f g) h are defined and equal at x.
proof Set k(y) = f (x y) when this is defined, so that k is measurable and defined almost everywhere.
Now
R RR
(|f | (|g| |h|))(x) = |f (x y)|(|g| |h|)(y)dy = |k(y)||g(y z)||h(z)|dzdy,
Z ZZ
((|f | |g|) |h|)(x) = (|f | |g|)(x y)|h(y)|dy = |f (x y z)||g(z)||h(y)|dzdy
ZZ ZZ
= |k(y + z)||g(z)||h(y)|dzdy = |k(y + z)||g(y)||h(z)|dydz.

So if either of these is finite, the conditions of 255Gb are satisfied, with k, g, h in the place of h, f and g,
and
R R
k(y)(g h)(y)dy = k(y + z)g(y)h(z)d(y, z),
that is,
Z Z
(f (g h))(x) = f (x y)(g h)(y)dy =
k(y)(g h)(y)dy
Z ZZ
= k(y + z)g(y)h(z)d(y, z) = k(y + z)g(y)h(z)dydz
ZZ Z
= f (x y z)g(y)h(z)dydz = (f g)(x z)h(z)dz

= ((f g) h)(x).

255K I do not think we shall need an exhaustive discussion of the question of just when (f g)(x) is
defined; this seems to be complicated. However there is a fundamental case in which we can be sure that
(f g)(x) is defined everywhere.
255L Convolutions of functions 269

Proposition Suppose that f , g are measurable complex-valued functions defined almost everywhere in R,
and that f LpC , g LqC where p, q [1, ] and p1 + 1q = 1 (writing
1
= 0 as usual). Then f g is defined
everywhere in R, is uniformly continuous, and supxR |(f g)(x)| kf kp kgkq .
proof (a) (For an introduction to Lp spaces, see 244.) For any x R, the function fx , defined by setting
fx (y) = f (x y) whenever x y dom f , must alsoR belong to Lp , because fx = f for an automorphism
of the measure space. Consequently (f g)(x) = fx g is defined, and of modulus at most kf kp kgkq ,
by 243Fa/243K and 244Eb/244Ob.
(b) To see that f g is uniformly continuous, argue as follows. Suppose first that p < . Let > 0. Let
> 0 be such that (2 + 21/p )kgkq . Then there is a bounded continuous function h : R C such that
{x : h(x) 6= 0} is bounded and kf hkp (244H, 244Ob); let M 1 be such that h(x) = 0 whenever
|x| M 1. Next, h is uniformly continuous, so there is a ]0, 1] such that |h(x) h(x0 )| M 1/p
whenever |x x0 | .
Suppose that |x x0 | . Defining hx (y) = h(x y), as before, we have

Z Z Z
|hx hx0 |p = |h(x y) h(x0 y)|p dy = |h(t) h(x0 x + t)|p dt

(substituting t = x y)
Z M
= |h(t) h(x0 x + t)|p dt
M
(because h(t) = h(x0 x + t) = 0 if |t| M )
2M (M 1/p )p
(because |h(t) h(x0 x + t)| M 1/p for every t)
= 2 p .

So khx hx0 kp 21/p . On the other hand,


R R R
|hx fx |p = |h(x y) f (x y)|p dy = |h(y) f (y)|p dy,
so khx fx kp = kh f kp , and similarly khx0 fx0 kp . So
kfx fx0 kp kfx hx kp + |hx hx0 kp + khx0 fx0 kp (2 + 21/p ).
This means that
Z Z Z
|(f g)(x) (f g)(x0 )| = | fx g fx0 g| = | (fx fx0 ) g|

kfx fx0 |p kgkq (2 + 21/p )kgkq .


As is arbitrary, f g is uniformly continuous.
The argument here supposes that p is finite. But if p = then q = 1 is finite, so we can apply the
method with g in place of f to show that g f is uniformly continuous, and f g = g f by 255Fb.

255L The r-dimensional case I have written 255A-255K out as theorems about Lebesgue measure
on R. However they all apply equally well to Lebesgue measure on R r for any r 1, and the modifications
required are so small that I think I need do no more than ask you to read through the arguments again,
turning every R into an R r , and every R 2 into an (R r )2 . In 255A and elsewhere, the measure 2 should
be read either as Lebesgue measure on R 2r or as the product measure on (R r )2 ; by 251M the two may be
identified. There is a trivial modification required in part (b) of the proof; if In = [an , bn [ then
Q
In = (In ) = ir max(0, ni ni ),
writing an = (n1 , . . . , nr ). In the proof of 255I, the functions fn should be defined by saying that
fn (x) = f (x) if |f (x)| n and kxk n, 0 otherwise.
In quoting these results, therefore, I shall be uninhibited in referring to the paragraphs 255A-255K as if
they were actually written out for general r 1.
270 Product measures 255M

255M The case of ], ] The same ideas also apply to the circle group S 1 and to the interval ], ],
but here perhaps rather more explanation is in order.

(a) The first thing to establish is the appropriate group operation. If we think of S 1 as the set {z : z
C, |z| = 1}, then the group operation is complex multiplication, and in the formulae above x + y must be
rendered as xy, while x y must be rendered as xy 1 . On the interval ], ], the group operation is +2 ,
where for x, y ], ] I write x +2 y for whichever of x + y, x + y + 2, x + y 2 belongs to ], ].
To see that this is indeed a group operation, one method is to note that it corresponds to multiplication
on S 1 if we use the canonical bijection x 7 eix : ], ] S 1 ; another, to note that it corresponds to the
operation on the quotient group R/2Z. Thus in this interpretation of the ideas of 255A-255K, we shall
wish to replace x + y by x +2 y, x by 2 x, and x y by x 2 y, where
2 x = x if x ], [, 2 = ,
and x 2 y is whichever of x y, x y + 2, x y 2 belongs to ], ].

(b) As for the measure, the measure to use on ], ] is just Lebesgue measure. Note that because ], ]
is Lebesgue measurable, there will be no confusion concerning the meaning of measurable subset, as the
relatively measurable subsets of ], ] are actually measurable for Lebesgue measure on R. Also we can
identify the product measure on ], ] ], ] with the subspace measure induced by Lebesgue measure
on R 2 (251Q).
On S 1 , we need the corresponding measure induced by the canonical bijection between S 1 and ], ],
which indeed is often called Lebesgue measure on S 1 . (We shall see in 265E that it is also equal to Hausdorff
one-dimensional measure on S 1 .) We are very close to the level at which it would become reasonable to
move to S 1 and this measure (or its normalized version, in which it is reduced by a factor of 2, so as to
make S 1 a probability space). However, the elementary theory of Fourier series, which will be the principal
application of this work in the present volume, is generally done on intervals in R, so that formulae based
on ], ] are closer to the standard expressions. Henceforth, therefore, I will express all the work in terms
of ], ].

(c) The result corresponding to 255A now takes a slightly different form, so I spell it out.

255N Theorem Let be Lebesgue measure on ], ] and 2 Lebesgue measure on ], ] ], ];


write , 2 for their domains.
(a) For any a ], ], the map x 7 a +2 x : ], ] ], ] is a measure space automorphism of
(], ] , , ).
(b) The map x 7 2 x : ], ] ], ] is a measure space automorphism of (], ] , , ).
(c) For any a ], ], the map x 7 a 2 x : ], ] ], ] is a measure space automorphism of
(], ] , , ).
2 2 2
(d) The map (x, y) 7 (x +2 y, y) : ], ] ], ] is a measure space automorphism of (], ] ,
2 , 2 ).
2 2 2
(e) The map (x, y) 7 (x 2 y, y) : ], ] ], ] is a measure space automorphism of (], ] ,
2 , 2 ).
proof (a) Set (x) = a +2 x. Then for any E ], ],
[E] = ((E + a) ], ]) (((E + a) ], 3]) 2) (((E + a) ]3, ]) + 2),
and these three sets are disjoint, so that

[E] = ((E + a) ], ]) + (((E + a) ], 3]) 2)


+ (((E + a) ]3, ]) + 2)
= L ((E + a) ], ]) + L (((E + a) ], 3]) 2)
+ L (((E + a) ]3, ]) + 2)
(writing L for Lebesgue measure on R)
255Od Convolutions of functions 271

= L ((E + a) ], ]) + L ((E + a) ], 3]) + L ((E + a) ]3, ])


= L (E + a) = L E = E.

Similarly, 1 [E] is defined and equal to E for every E , so that is an automorphism of (], ] ,
, ).
(b) Of course this is quicker. Setting (x) = 2 x for x ], ], we have

([E]) = ([E] ], [) = ((E ], [)


= L ((E ], [)) = L (E ], [)
= (E ], [) = E
for every E .
(c) This is just a matter of putting (a) and (b) together, as in 255A.
(d) We can argue as in (a), but with a little more elaboration. If E 2 , and (x, y) = (x +2 y, y) for
2
x, y ], ], set (x, y) = (x + y, y) for x, y R, and write c = (2, 0) R 2 , H = ], ] , H 0 = H + c,
H 00 = H c. Then for any E 2 ,
[E] = ([E] H) (([E] H 0 ) c) (([E] H 00 ) + c),
so

2 [E] = 2 ([E] H) + 2 (([E] H 0 ) c) + 2 (([E] H 00 ) + c)


= L ([E] H) + L (([E] H 0 ) c) + L (([E] H 00 ) + c)
(this time writing L for Lebesgue measure on R 2 )
= L ([E] H) + L ([E] H 0 ) + L ([E] H 00 )
= L [E] = L E = 2 E.

2
In the same way, 2 (1 [E]) = 2 E for every E 2 , so is an automorphism of (], ] , 2 , 2 ), as
required.
(e) Finally, (e) is just a restatement of (d), as before.

255O Convolutions on ], ] With the fundamental result established, the same arguments as in
255B-255K now yield the following. Write for Lebesgue measure on ], ].

(a) Let f and g be measurable complex-valued functions defined almost everywhere in ], ]. Write f g
for the function defined by the formula
R
(f g)(x) = f (x 2 y)g(y)dy
whenever x ], ] and the integral exists as a complex number. Then f g is the convolution of the
functions f and g.

(b) If f , g are measurable complex-valued functions defined almost everywhere in ], ], then f g = gf ,


in the strict sense that they have the same domain and the same value at each point of that common domain.

(c) We may regard convolution as a binary operator on L0C ; if u, v L0C (), we can define u v as being
equal to (f g) whenever f = u and g = v.

(d) Let f , g and h be measurable complex-valued functions defined almost everywhere in ], ]. Then
(i)
R R

h(x)(f g)(x)dx = ],]2 h(x + y)f (x)g(y)d(x, y)
272 Product measures 255Od

whenever the right-hand side exists and is finite, provided that in the expression h(x)(f g)(x) we interpret
the product as 0 if h(x) = 0 and (f g)(x) is undefined. R
(ii) If, on the same interpretation of |h(x)|(|f | |g|)(x), the integral |h(x)|(|f | |g|)(x)dx is finite,
R
then ],]2 h(x + y)f (x)g(y)d(x, y) exists in C, so again we shall have
R R

h(x)(f g)(x)dx = ],]2 h(x + y)f (x)g(y)d(x, y).

(e) If f , g are complex-valued functions which are integrable over ], ], then f g is integrable, with
R R R R R R

f g = f g,
|f g| |f | |g|.

(f ) Let f , g, h be complex-valued measurable functions defined almost everywhere in ], ]. Suppose


that x ], ] is such that one of (|f | (|g| |h|))(x), ((|f | |g|) |h|)(x) is defined in R. Then f (g h)
and (f g) h are defined and equal at x.
(g) Suppose that f LpC (), g LqC () where p, q [1, ] and 1
p + 1
q = 1. Then f g is defined
everywhere in ], ], and supx],] |(f g)(x)| kf kp kgkq .

255X Basic exercises > (a) Let Rf , g be complex-valued functions defined almost everywhere in R.
Show that for any x R, (f g)(x) = f (x + y)g(y)dy if either is defined.
> (b) Let f and g be complex-valued functions defined almost everywhere in R. (i) Show that if f and g
are even functions, so is f g. (ii) Show that if f is even and g is odd then f g is odd. (iii) Show that if f
and g are odd then f g is even.
> (c) Let be Lebesgue measure on R. Show that we have a function : L1C () L1C () L1C () given
by setting f g = (f g) for all f , g L1C (). Show that L1C is a commutative Banach algebra under
(definition: 2A4J).
R
(d) (i) Show that if hRis an integrableR function on R 2 , then (T h)(x) = h(x y, y)dy exists for almost
every x R, and that (T h)(x)dx = h(x, y)d(x, y). (ii) Write 2 for Lebesgue measure on R 2 , for
Lebesgue measure on R. Show that there is a linear operator T : L1 (2 ) L1 () defined by setting
T (h ) = (T h) for every integrable function h on R 2 . (iii) Show that in the language of 253E and (b) above,
T (u v) = u v for all u, v L1 ().
P P
>(e) For a, b CZ set (a a b)(n) = iZ a(n i)bb(i) whenever iZ |a a(n i)bb(i)| < . Show that
a
(i) P b
= ; b a P P
(ii) iZ c(i)(a a b)(i) = i,jZ c(i + j)a a(i)bb(j) if i,jZ |cc(i + j)a a(i)bb(j)| < ;
1 1 a b k1 ka
ak1 kbbk1 ;
(iii) if a , b ` (Z) then a b ` (Z) and ka
(iv) If a , b `2 (Z) then a b ` (Z) and ka a b k ka
ak2 kbbk2 ;
(v) if a , b , c CZ and (|a a| (|bb| |cc|))(n) is well-defined, then (a a (bb c ))(n) = ((a
a b ) c )(n).
(f ) Suppose that f , g are real-valued measurable functions defined almost everywhere in R r and such
that f > 0 a.e., g a.e. 0 and {x : g(x) > 0} is not negligible. Show that f g > 0 everywhere in dom(f g).
> (g) Suppose that f : R C is a bounded differentiable function and that f 0 is bounded. Show that
for any integrable complex-valued function g on R, f g is differentiable and (f g)0 = f 0 g everywhere.
(Hint: 123D.)
Rb
(h) A complex-valued function g defined almost everywhere in R is locally integrable if a g is defined
in C whenever a < b in R. Suppose that g is such a function and that f : R C is a differentiable function,
with continuous derivative, such that {x : f (x) 6= 0} is bounded. Show that (f g)0 = f 0 g everywhere.
R
1
> (i) Set (x) = exp( 2 x 2 ) if |x| < , 0 if |x| , as in 242Xi. Set = , = 1 . Let
f be a locally integrable complex-valued function on R. (i) Show that f is a smooth function defined
everywhere on R for every > 0. (ii) Show that limR0 (f )(x) = f (x) for almost every x R. (Hint:
223Yg.) (iii) Show that if f is integrable then lim0 |fR f | = 0.
R (Hint: use (ii) and 245H(a-ii) or look
first at the case f = [a, b] and use 242O, noting that |f | |f |.) (iv) Show that if f is uniformly
continuous and defined everywhere on R then lim0 supxR |f (x) (f )(x)| = 0.
255Yj Convolutions of functions 273

1 1
> (j) For > 0, set g (t) = t for t > 0, 0 for t 0. Show that g g = g+ for all , > 0.
()
(Hint: 252Yk.)

x
255Y Further exercises (a) Set f (x) = 1 for all x R, g(x) = for 0 < |x| 1 and 0 otherwise,
|x|
h(x) = tanh x for all x R. Show that f (g h) and (f g) h are both defined (and constant) everywhere,
and are different.

(b) Discuss what can happen if, in the context of 255J, we know that (|f | (|g| |h|))(x) is defined, but
have no information on the domain of f g.

(c) Suppose that p [1, [ and that f LpC (), where is Lebesgue measure on R r . For a R r set
(Sa f )(x) = f (a + x) whenever a + x dom f . Show that Sa f LpC (), and that for every > 0 there is a
> 0 such that kSa f f kp whenever |a| .

(d) Suppose that p, q ]1, [ and p1 + 1q = 1. Let f LpC (), g LqC (), where is Lebesgue measure
on R r . Show that limkxk (f g)(x) = 0. (Hint: use 244Hb.)

(e) Repeat 255Yc and 255K, this time taking to be Lebesgue measure on ], ], and setting (Sa f )(x) =
f (a +2 x) for a ], ]; show that in the new version of 255K, (f g)() = limx (f g)(x).

(f ) Let be Lebesgue measure on R. For a R, f L0 = L0 () set (Sa f )(x) = f (a + x) whenever


a + x dom f .
(i) Show that Sa f L0 for every f L0 .
(ii) Show that we have a map Sa : L0 L0 defined by setting Sa (f ) = (Sa f ) for every f L0 .
(iii) Show that Sa is a Riesz space isomorphism and is a homeomorphism for the topology of convergence
in measure; moreover, that Sa (u v) = Sa u Sa v for all u, v L0 .
(iv) Show that Sa+b = Sa Sb for all a, b R.
(v) Show that lima0 Sa u = u for the topology of convergence in measure, for every u L0 .
(vi) Show that if 1 p then Sa Lp is an isometric isomorphism of the Banach lattice Lp .
(vii) Show that if p [1, [ then lima0 kSa u ukp = 0 for every u Lp .
(viii) Show that if A L1 is uniformly integrable and M 0, then {Sa u : u A, |a| M } is uniformly
integrable.
(ix) Show that if u, v L0 are such that u v is defined in L0 , then Sa (u v) = (Sa u) v = u (Sa v)
for every a R.

(g) Prove 255Nd from 255Na by the method used to prove 255Ad from 255Aa, rather than by quoting
255Ad.

(h) Repeat the results of this chapter for the group (S 1 )r , where r 2, given its product measure.

(i) Let f be a complex-valued function which is integrable over R. (i) Let x be any point of the Lebesgue
set of f . Show that for any > 0 there is a > 0 such that |f (x) (f g)(x)| whenever g : R [0, [ is
R R
a function which is non-decreasing on ], 0], non-decreasing on [0, [, and has g = 1 and g 1 .
(ii) Show that for any > 0 there is a > 0 such that kf f gk1 whenever g : R [0, [ is a function
R R
which is non-decreasing on ], 0], non-decreasing on [0, [, and has g = 1 and g 1 .

(j) Let f be a complex-valued function which is integrable over R. Show that, for almost every x R,
a R f (y) 1 R
lima
dy, lima x f (y)ea(yx) dy,
(xy) +a
2 2 a

1 R 2
/2 2
lim0
f (y)e(yx) dy
2
all exist and are equal to f (x). (Hint: 263G.)
274 Product measures 255Yk

(k) Let be Lebesgue measure on R, and : R R a convex function such that (0) = 0; let
: L0 L0 =RL0 () be the associated operator (see 241I). Show that if u L1 = L1 (), v L0 are such
that u, v 0, u = 1 and and u (v) is defined in L0 , then (u v) u (v). (Hint: 233I.)

(l) Let be Lebesgue measure on R, and p [1, ]. Let f L1C (), g LpC (). Show that f g LpC ()
and that kf gkp kf k1 kgkp . (Hint: argue from 255Yk, as in 244M.)

(m) Suppose that p, q, r ]1, [ and that p1 + 1q = 1 + 1r . Let be Lebesgue measure on R. (i) Show
that
R 1p/r 1q/r R p
f g kf kp kgkq ( f g q )1/r
0 0
whenever f , g 0 and f Lp (), g Lq (). (Hint: set p0 = p/(p R1), etc.; f1 = f p/q , g1 = g q/p ,
h = (f p g q )1/r . Use 244Xd to see that kf1 g1 kr0 kf1 kq0 kg1 kp0 , so that f1 g1 h kf1 kq0 kg1 kp0 khkr .)
(ii) Show that kf gkr kfRkp kgkq for all f Lp (), g Lq (). (Hint: take f , gR 0. Use (i) to see that
(f g)(x)r kf krpp kgkrq
q f (y)p g(x y)q dy, so that kf gkrr kf krp
p kgkrq
q f (y)p kgkqq dy.) (This is
Youngs inequality.)

(n) Let G be a group and a -finite measure on G such that () for every a G, the map x 7 ax is an
automorphism of (G, ) () the map (x, y) 7 (x, xy) is an automorphismR of (G2 , 2 ), where 2 is the c.l.d.
product measure on G G. For f , g LC () write (f g)(x) = f (y)g(y 1 x)dy whenever this is defined.
0

Show that R R
(i) ifR f , g, h L0C () and h(xy)f (x)g(y)d(x, y) is defined in C, then h(x)(f g)(x)dx exists and is
equal to h(xy)f (x)g(y)d(x, y), provided that in the expression h(x)(f g)(x) we interpret the product as
0 if h(x) = 0 and (f g)(x) is undefined; R R R
(ii) if f , g L1C () then f g L1C () and f g = f g, kf gk1 kf k1 kgk1 ;
(iii) if f , g, h L1C () then f (g h) = (f g) h.
(See Halmos 50, 59.)

(o) Repeat 255Yn for counting measure on any group G.

255 Notes and comments I have tried to set this section out in such a way that it will be clear that the
basis of all the work here is 255A, and the crucial application is 255G. I hope that if and when you come
to look at general topological groups (for instance, in Chapter 44), you will find it easy to trace through
the ideas in any abelian topological group for which you can prove a version of 255A. For non-abelian
groups, of course, rather more care is necessary, especially as in some important examples we no longer
have {x1 : x E} = E for every E; see 255Yn-255Yo for a little of what can be done without using
topological ideas.
The critical point in 255A is the move from the one-dimensional results in 255Aa-255Ac, which are just the
translation- and reflection-invariance of Lebesgue measure, to the two-dimensional results in 255Ac-255Ad.
And the living centre of the argument, as I present it, is the fact that the shear transformation is an
automorphism of the structure (R 2 , 2 ). The actual calculation of 2 [E], assuming that it is measurable,
is an easy application of Fubinis and Tonellis theorems and the translation-invariance of . It is for this
step that we absolutely need the topological properties of Lebesgue measure. I should perhaps remind you
that the fact that is a homeomorphism is not sufficient; in 134I I described a homeomorphism of the unit
interval which does not preserve measurability, and it is easy to adapt this to produce a homeomorphism
: R 2 R 2 such that [E] is not always measurable for measurable E. The argument of 255A is dependent
on the special relationships between all three of the measure, topology and group structure of R.
I have already indulged in a few remarks on what ought, or ought not, to be obvious (255C). But perhaps
I can add that such results as 255B and the later claim, in the proof of 255K, that a reflected version of
a function in Lp is also in Lp , can only be trivial consequences of results like 255A if every step in the
construction of the integral is done in the abstract context of general measure spaces. Even though we are
here working exclusively with the Lebesgue integral, the argument will become untrustworthy if we have
at any stage in the definition of the integral even mentioned that we are thinking of Lebesgue measure. I
advance this as a solid reason for defining integration on abstract measure spaces from the beginning, as
256 intro. Radon measures on R r 275

I did in Volume 1. Indeed, I suggest that generally in pure mathematics there are good reasons for casting
arguments into the forms appropriate to the arguments themselves.
I am writing this book for readers who are interested in proofs, and as elsewhere I have written the proofs
of this section out in detail. But most of us find it useful to go through some material in advanced calculus
mode, by which I mean starting with a formula such as
R
(f g)(x) = f (x y)g(y)dy,
and then working out consequences by formal manipulations, for instance
R RR RR
h(x)(f g)(x)dx = h(x)f (x y)g(y)dydx = h(x + y)f (x)g(y)dydx,
without troubling about the precise applicability of the formulae to begin with. In some ways this formula-
driven approach can be more truthful to the structure of the subject than the careful analysis I habitually
present. The exact hypotheses necessary to make the theorems strictly true are surely secondary, in such
contexts as this section, to the pattern formed by the ensemble of the theorems, which can be adequately
and elegantly expressed in straightforward formulae. Of course I do still insist that we cannot properly
appreciate the structure, nor safely use it, without mastering the ideas of the proofs and as I have said
elsewhere, I believe that mastery of ideas necessarily includes mastery of the formal details, at least in the
sense of being able to reconstruct them fairly fluently on demand.
Throughout the main exposition of this section, I have worked with functions rather than equivalence
classes of functions. But all the results here have interpretations of great importance for the theory of the
function spaces of Chapter 24. It is an interesting point that if u, v L0 then u v is most naturally
interpreted as a function, not as a member of L0 , even if it is defined almost everywhere. Thus 255H can be
regarded as saying that u v L1 for u, v L1 . We cannot quite say that convolution is a bilinear operator
from L1 L1 to L1 , because L1 is not strictly speaking a linear space. If we want a bilinear functional,
then we have to replace the function u v by its equivalence class, so that convolution becomes a bilinear
map from L1 L1 to L1 . But when we look at convolution as a function on L2 L2 , for instance, then
our functions u v are defined everywhere (255K), and indeed are continuous functions vanishing at
(255Yc-255Yd). So in this case it seems more appropriate to regard convolution as a bilinear operator from
L2 L2 to some space of continuous functions, and not as an operator from L2 L2 to L . For an example
of an interesting convolution which is not naturally representable in terms of an operator on Lp spaces, see
255Xj.
Because convolution acts as a continuous bilinear operator from L1 () L1 () to L1 (), where is
Lebesgue measure on R, Theorem 253F tells us that it must correspond to a linear operator from L1 (2 ) to
L1 (), where 2 is Lebesgue measure on R 2 . This is the operator T of 255Xd.
So far in these notes I have written as though we were concerned only with Lebesgue measure on R.
However many applications of the ideas involve R r or ], ] or S 1 . The move to R r should be elementary.
The move to S 1 does require a re-formulation of the basic result 255A/255N. It should also be clear that
r
there will be no new difficulties in moving to ], ] or (S 1 )r . Moreover, we can also go through the
whole theory for the groups Z and Z , where the appropriate measure is now counting measure, so that L0C
r
r
becomes identified with CZ or CZ (255Xe, 255Yo).

256 Radon measures on R r


In the next section, and again in Chapters 27 and 28, we need to consider the principal class of measures
on Euclidean spaces. For a proper discussion of this class, and the interrelationships between the measures
and the topologies involved, we must wait until Volume 4. For the moment, therefore, I present definitions
adapted to the case in hand, warning you that the correct generalizations are not quite obvious. I give the
definition (256A) and a characterization (256C) of Radon measures on Euclidean spaces, and theorems on
the construction of Radon measures as indefinite integrals (256E, 256J), as image measures (256G) and as
product measures (256K). In passing I give a version of Lusins theorem concerning measurable functions
on Radon measure spaces (256F).
276 Product measures 256A

256A Definitions Let be a measure on R r , where r 1, and its domain.

(a) is a topological measure if every open set belongs to . Note that in this case every Borel set,
and in particular every closed set, belongs to .

(b) is locally finite if every bounded set has finite outer measure.

(c) If is a topological measure, it is inner regular with respect to the compact sets if
E = sup{K : K E is compact}
for every E . (Because is a topological measure, and compact sets are closed (2A2Ec), K is defined
for every compact set K.)

(d) is a Radon measure if it is a complete locally finite topological measure which is inner regular
with respect to the compact sets.

256B It will be convenient to be able to call on the following elementary facts.


Lemma Let be a Radon measure on R r , and its domain.
(a) is -finite.
(b) For any E and any > 0 there are a closed set F E and an open set G E such that
(G \ F ) .
(c) For every E there is a set H E, expressible as the union of a sequence of compact sets, such
that (E \ H) = 0.
(d) Every continuous real-valued function on R r is -measurable.
(e) If h : R r R is continuous and has bounded support, then h is -integrable.
proof (a) For each n N, B(0, n) = {x : kxk n} is a closed bounded set, therefore Borel. So if is a
Radon measure on R r , hB(0, n)inN is a cover of R r by a sequence of sets of finite measure.
(b) Set En = {x : x E, n kxk < n + 1} for
S each n. Then En < , so there is a compact set
Kn En such that Kn En 2n2 . Set F = nN Kn ; then
P 1
(E \ F ) = n=0 (En \ Kn ) .
2
Also F E and F is closed because
S
F B(0, n) = in Ki B(0, n)
is closed for each n.
In the same way, there is a closed set F 0 R r \ E such that ((R r \ E) \ F 0 ) 21 . Setting G = R r \ F 0 ,
we see that G is open, that G E and that (G \ E) 21 , so that (G \ F ) , as required.
Fn E such that (E \ Fn ) 2n . Set
(c)SBy (b), we can choose for each n N a closed set S
H = nN Fn ; then H E and (E \ H) = 0, and also H = m,nN B(0, m) Fn is a countable union of
compact sets.
(d) If h : R r R is continuous, all the sets {x : h(x) > a} are open, so belong to .
(e) By (d), h is measurable. Now we are supposing that there is some n N such that h(x) = 0
whenever x / B(0, n). Since B(0, n) is compact (2A2F), h is bounded on B(0, n) (2A2G), and we have
|h| B(0, n) for some ; since B(0, n) is finite, h is -integrable.

256C Theorem A measure on R r is a Radon measure iff it is the completion of a locally finite
measure defined on the -algebra B of Borel subsets of R r .
proof (a) Suppose first that is a Radon measure. Write for its domain.
(i) Set 0 = B. Then 0 is a measure with domain B, and it is locally finite because 0 B(0, n) =
B(0, n) is finite for every n. Let 0 be the completion of 0 (212C).
256C Radon measures on R r 277

(ii) If 0 measures E, there are E1 , E2 B such that E1 E E2 and 0 (E2 \ E1 ) = 0. Now


E \ E1 E2 \ E1 must be -negligible; as is complete, E and
E = E1 = 0 E1 = 0 E.

(iii) If E , then by 256Bc there is a Borel set H E such that (E \ H) = 0. Equally, there is a
Borel set H 0 R r \ E such that ((R r \ E) \ H 0 ) = 0, so that we have H E R r \ H 0 and
0 ((R r \ H 0 ) \ H) = ((Rr \ H 0 ) \ H) = 0.
So 0 E is defined and equal to 0 E1 = E.
This shows that = 0 is the completion of the locally finite Borel measure B. And this is true for any
Radon measure on R r .
(b) For the rest of the proof, I suppose that 0 is a locally finite measure on R r and is its completion.
Write for the domain of . We say that a subset of R r is a K set if it is expressible as the union of a
sequence of compact sets. Note that every K set is a Borel set, so belongs to . Set
A = {E : E , there is a K set H E such that (E \ H) = 0},

= {E : E A, R r \ E A}.

(c)(i) Every open set is itself a K set, so belongs to A. P P Let G R r be open. If G = then G is
compact and the result is trivial. Otherwise, let I be the set of closed intervals of the form [q, q 0 ], where q,
q 0 Qr , which are included in G. Then all the members of I are closed and bounded, therefore compact.
If x G, there is a > 0 such
S that B(x, ) = {y : ky xk } G; now there is an I I such that
x I B(x, ). Thus G = I. But I is countable, so G is K . Q Q
S
(ii) Every closed subset of R is K , so belongs to A. PP If F R is closed, then F = nN F B(0, n);
but every F B(0, n) is closed and bounded, therefore compact. Q Q
S
(iii) If hEn inN is any sequence in A, then E = nN En Sbelongs to A. P P ForSeach n N we have a
countable family Kn of compact subsets ofSEn such S that (En
S \ K n ) = 0; now K = nN Kn is a countable
family of compact subsets of E, and E \ K nN (En \ Kn ) is -negligible. Q Q
T
(iv) If hEn inN is any sequence in A, then F = S nN En A. P P For each
S n N, let hKni iiN be a
0
sequence of compact subsets of En such that (En \ iN Kni ) = 0. Set Knj = ij Kni for each j, so that
0
(En H) = limj (Knj H)
for every H . Now, for each m, n N, choose j(m, n) such that
0
(En B(0, m) Kn,j(m,n) ) (En B(0, m)) 2(m+n) .
T 0
Set Km = nN Kn,j(m,n) ; then Km is closed (being an intersection of closed sets) and bounded (being a
0 0
subset of K0,j(m,0) ), therefore compact. Also Km F , because Kn,j(m,n) En for each n, and
P 0
P
(F B(0, m) \ Km ) n=0 (En B(0, m) \ Kn,j(m,n) ) n=0 2(m+n) = 2m+1 .
S
Consequently H = mN Km is a K subset of F and
(F B(0, m) \ H) inf km (F B(0, k) \ Hk ) = 0
for every m, so (F \ H) = 0 and F A. Q
Q
(d) is a -algebra of subsets of R. P
P (i) and its complement are open, so belong to A and therefore
to . (ii) If E then both R r \ E and Rr \ (R r \ E) = E belong to A, so Rr \ E . (iii) Let hEn inN
be a sequence in with union E. By (a-iii) and (a-iv),
T
E A, R r \ E = nN (R r \ En ) A,
so E . Q
Q
(e) By (c-i) and (c-ii), every open set belongs to ; consequently every Borel set belongs to and
therefore to A. Now if E is any member of , there is aSBorel set E1 E such that (E \ E1 ) = 0 and a
K set H E1 such that (E1 \ H) = 0. Express H as nN Kn where every Kn is compact; then
278 Product measures 256C

S
E = H = limn ( in Ki ) supKE is compact K E
S
because in Ki is a compact subset of E for every n.
(f ) Thus is inner regular with respect to the compact sets. But of course it is complete (being the
completion of 0 ) and a locally finite topological measure (because 0 is); so it is a Radon measure. This
completes the proof.

256D Proposition If and 0 are two Radon measures on R r , the following are equiveridical:
(i) = 0 ;
(ii) K = 0 K for every compact set K R r ;
(iii) RG = 0 GR for every open set G R r ;
(iv) h d = h d 0 for every continuous function h : R r R with bounded support.
proof (a)(i)(iv) is trivial.
(b)(iv)(iii) If (iv) is true, and G R r is an open set, then for each n N set
hn (x) = min(1, 2n inf yR r \(GB(0,n)) ky xk)
for x R r . RThen hn is
R continuous (in fact |hn (x) hn (x0 )| 2n kx x0 k for all x, x0 R r ) and zero outside
0
B(0, n), so hn d = hn d . Next, hhn (x)inN is a non-decreasing sequence converging to G(x) for every
x R r . So
R R
G = limn hn d = limn hn d 0 = 0 G,
by 135Ga. As G is arbitrary, (iii) is true.
(c)(iii)(ii) If (iii) is true, and K R r is compact, let n be so large that kxk < n for every x K. Set
G = {x : kxk < n}, H = G \ K. Then G and H are open and G is bounded, so G = 0 G is finite, and
K = G H = 0 G 0 H = 0 K.
As K is arbitrary, (ii) is true.
(d)(ii)(i) If , 0 agree on the compact sets, then
E = supKE is compact K = supKE is compact 0K = 0E
for every Borel set E. So B = 0 B, where B is the algebra of Borel sets. But since and 0 are both the
completions of their restrictions to B, they are identical.

256E It is I suppose time I gave some examples of Radon measures. However it will save a few lines if
I first establish some basic constructions. You may wish to glance ahead to 256H at this point.
Theorem Let be a Radon measure on R r , with domain , and f a non-negative -measurable R function
defined on a -conegligible subset of R r . Suppose that f is locally integrable in the sense that E f d <
for every bounded set E. Then the indefinite-integral measure 0 on R r defined by saying that
R
0E = E
f d whenever E {x : x dom f, f (x) > 0}
r
is a Radon measure on R .
proof For the construction of 0 , see 234A-234D. It is a topological measure because every open set belongs
to and therefore to the domain 0 of 0 . 0 is locally finite because f is locally integrable. To see that 0 is
inner regular with respect to the compact sets, take any set E 0 , and set E 0 = {x : x E dom f, f (x) >
0}. Then E 0 , so there is a set H E 0 , expressible as the union of a sequence of compact sets, such that
(E 0 \ H) = 0. In this case
R
0 (E \ H) = E\H
f d = 0.
Let hKn inN be a sequence of compact sets with union H; then
S
0 E = 0 H = limn 0 ( in Ki ) supKE is compact 0 K 0 E.
As E is arbitrary, 0 is inner regular with respect to the compact sets.
256Hb Radon measures on R r 279

256F Theorem Let be a Radon measure on R r , and its domain. Let f : D R be a -measurable
function, where D R r . Then for every > 0 there is a closed set F R r such that (Rr \ F ) and
f F is continuous.
proof By 121I, there is a -measurable function h : R r R extending f . Enumerate Q as hqn inN . For
each n N set En = {x : h(x) qn }, En0 = {x : h(x) > qn } and use 256Bb T to choose closed sets Fn En ,
Fn0 En0 such that (En \ Fn ) 2n2 , (En0 \ Fn0 ) 2n2 . Set F = nN (Fn Fn0 ); then F is closed
and
P P
(R r \ F ) n=0 (R r \ (Fn Fn0 )) n=0 (En \ Fn ) + (En0 \ Fn0 ) .
I claim that hF is continuous. P
P Suppose that x F and > 0. Then there are m, n N such that
h(x) qm < h(x) qn h(x) + .
0
This means that x Em En ; consequently x / Fm Fn0 . Because Fm Fn0 is closed, there is an > 0
such that y / Fm Fn0 whenever ky xk . Now suppose that y F and ky xk . Then
0
y (Fm Fm ) (Fn Fn0 ) and y / Fm Fn0 , so y Fm
0
Fn Em 0
En and qm < h(y) qn . Consequently
|h(y) h(x)| . As x and are arbitrary, hF is continuous. Q Q Consequently f F = (hF )D is
continuous, as required.

256G Theorem Let be a Radon measure on R r , with domain , and suppose that : R r R s is
measurable in the sense that all its coordinates are -measurable. If the image measure 0 = 1 (112F)
is locally finite, it is a Radon measure.
proof Write for the domain of and 0 for the domain of 0 . If = (1 , . . . , s ), then
1 [{y : j }] = {x : j (x) } ,
so {y : j } 0 for every j s, R, where I write y = (1 , . . . , s ) for y R s . Consequently every
Borel subset of R s belongs to 0 (121J), and 0 is a topological measure. It is complete because if F is
0 -negligible, and H F , then 1 [H] 1 [F ] is -negligible, therefore belongs to (cf. 211Xd).
The point is of course that 0 is inner regular with respect to the compact sets. P P Suppose that F 0
0 r
and that < F . For each j s, there is T a closed set Hj R such that j Hj is continuous and
(R r \ Hj ) < 1s ( 0 F ), by 256F. Set H = js Hj ; then H is closed and H is continuous and
(R r \ H) < 0 F = 1 [F ] ,
so that (1 [F ] H) > . Let K 1 [F ] H be a compact set such that K , and set L = [K].
Because K H and H is continuous, L is compact (2A2Eb). Of course L F , and
0 L = 1 [L] K .
As F and are arbitrary, 0 is inner regular with respect to the compact sets. Q
Q
Since 0 is locally finite by the hypothesis of the theorem, it is a Radon measure.

256H Examples I come at last to the promised examples.

(a) Lebesgue measure on R r is a Radon measure. (It is a topological measure by 115G, and inner regular
with respect to the compact sets by 134Fb.)

(b) Let htn inN be any sequence in R r , and han inN any summable sequence in [0, [. For every E Rr
set
P
E = {an : tn E}.
so that is a totally finite point-supported measure. Then is a (totally finite) Radon measure on R r . P
P
Clearly is complete and defined on every Borel set and gives finite measure to bounded sets. To see that
it is inner regular with respect to the compact sets, observe that for any E R r the sets
Kn = E {ti : i n}
are compact and E = limn Kn . Q
Q
280 Product measures 256Hc

(c) Now we come to a new idea. Recall that the Cantor set C (134G) is a closed negligible subset of
[0, 1], and that the Cantor function (134H) is a non-decreasing continuous function f : [0, 1] [0, 1] such
that f (0) = 0, f (1) = 1 and f is constant on each of the intervals composing [0, 1] \ C. It follows that if we
set g(x) = 12 (x + f (x)) for x [0, 1], then g : [0, 1] [0, 1] is a continuous bijection such that the Lebesgue
measure of g[C] is 21 (134I); consequently g 1 : [0, 1] [0, 1] is continuous. Now extend g to a bijection
h : R R by setting h(x) = x for x R \ [0, 1]. Then h and h1 are continuous. Note that h[C] = g[C] has
Lebesgue measure 21 .
Let 1 be the indefinite-integral measure defined from Lebesgue measure on R and the function 2(h[C]);
that is, 1 E = 2(Eh[C]) whenever this is defined. By 256E, 1 is a Radon measure, and 1 h[C] = 1 R = 1.
Let be the measure 1 h, that is, E = 1 h[E] for just those E R such that h[E] dom 1 . Then is a
Radon probability measure on R, by 256G, and C = 1, (R \ C) = C = 0.

256I Remarks (a) The measure of 256Hc, sometimes called Cantor measure, is a classic example,
and as such has many constructions, some rather more natural than the one I use here (see 256Xk, and also
264Ym below). But I choose the method above because it yields directly, without further investigation or
any appeal to more advanced general theory, the fact that is a Radon measure.

(b) The examples above are chosen to represent the extremes under the Lebesgue decomposition de-
scribed in 232I. If is a (totally finite) Radon measure on R r , we can use 232Ib to express its restriction
B to the Borel -algebra as p + ac + cs , where p is the point-mass or atomic part of B, ac is the
absolutely continuous part (with respect to Lebesgue measure), and cs is the atomless singular part. In
the example of 256Hb, we have B = p ; in 256E, if we start from Lebesgue measure, we have B = ac ;
and in 256Hc we have B = cs .

256J Absolutely continuous Radon measures It is worth pausing a moment over the indefinite-
integral measures described in 256E.
Proposition Let be a Radon measure on R r , where r 1, and write for Lebesgue measure on R r .
Then the following are equiveridical:
(i) is an indefinite-integral measure over ;
(ii) E = 0 whenever E is aR Borel subset of R r and E = 0.
In this case, if g L0 () and E g d = E for every Borel set E R r , then g is a Radon-Nikodym
derivative of with respect to in the sense of 234B.
proof (a)(i)(ii) If f is a Radon-Nikodym derivative of with respect to , then of course
R
E = E
f d = 0
whenever E = 0.
(ii)(i) If E = 0 for every -negligible Borel set E, then E is defined and equal to 0 for every
-negligible set E, because is complete and any -negligible set is included in a -negligible Borel set.
Consequently dom includes the domain of , since every Lebesgue measurable set is expressible as the
union of a Borel set and a negligible set.
For each n N set En = {x : n kxk < n + 1}, so that hEn inN is a partition of R r into bounded Borel
sets. Set n E = (E En ) for every Lebesgue measurable set E and every n N. Now n is absolutely
continuous with respectR to (232Ba), so by the Radon-Nikodym theorem (232F) there is a -integrable
function fn such that E fn d = n E for every Lebesgue measurable set E. Because n E 0 for every
E , fn a.e. 0; because n (R r \ En ) = 0, fn = 0 a.e. on R r \ En . Now if we set
P
f = max(0, n=0 fn ),
f will be defined -a.e. and we shall have
R P R P
E
f d = n=0 E fn d = n=0 (E En ) = E
for every Borel set E, so that the indefinite-integral measure 0 defined by f and agrees with on the
Borel sets. Since this ensures that 0 is locally finite, 0 is a Radon measure, by 256E, and is equal to , by
256D. Accordingly is an indefinite-integral measure over .
*256M Radon measures on R r 281

(b) As in (a-ii) above, h must be locally integrable and the indefinite-integral measure defined by h agrees
with on the Borel sets, so is identical with .

256K Products The class of Radon measures on Euclidean spaces is stable under a wide variety of
operations, as we have already seen; in particular, we have the following.
Theorem Let 1 , 2 be Radon measures on R r and R s respectively, where r, s 1. Let be their c.l.d.
product measure on R r R s . Then is a Radon measure.
Remark When I say that is Radon according to the definition in 256A, I am of course identifying R r R s
with R r+s , as in 251L-251M.
proof (a) I hope the following rather voluminous notation will seem natural. Write 1 , 2 for the domains
of 1 , 2 ; Br , Bs for the Borel -algebras of R r , R s ; for the domain of ; and B for the Borel -algebra of
R r+s .
Because each i is the completion of its restriction to the Borel sets (256C), is the product of 1 Br
and 2 Bs (251S). Because 1 Br and 2 Bs are -finite (256Ba, 212Ga), must be the completion of its
restriction to Br B b s , which by 251L is identified with B. Setting Qn = {(x, y) : kxk n, kyk n} we have
Qn = 1 {x : kxk n} 2 {y : kyk n} <
for every n, while every bounded subset of R r+s is included in some Qn . So B is locally finite, and its
completion is a Radon measure, by 256C.

256L Remark We see from 253I that if 1 and 2 are Radon measures on R r and R s respectively, and
are both indefinite-integral measures over Lebesgue measure, then their product measure on R r+s is also an
indefinite-integral measure over Lebesgue measure.

*256M For the sake of applications in 286 below, I include another result, which is in fact one of the
fundamental properties of Radon measures, as will appear in 414.
Proposition Let be a Radon measure on Rr , and D any subset of Rr . Let be a non-empty upwards-
directed family of non-negative continuous functions from D to R. For x D set g(x) = supf f (x) in
[0, ]. Then
(a) gR : D [0, ] is lower
R semi-continuous, therefore Borel measurable;
(b) D g d = supf D f d.
proof (a) For any u [, ],
S
{x : x D, g(x) > u} = f {x : x D, f (x) > u}
is an open set for the subspace topology on D (2A3C), so is the intersection of D with a Borel subset of Rr .
This is enough to show that g is Borel measurable (121B-121C).
R R R
(b) Accordingly D g d will be defined in [0, ], and of course D g d supf D f d.
For the reverse inequality, observe that there is a countable set such that g(x) = supf f (x) for
every x D. P P For a Q, q, q 0 Q r set
aqq0 = {f : f , f (y) > a whenever y D [q, q 0 ]},
interpreting [q, q 0 ] as in 115G. Choose faqq0 aqq0 if aqq0 is not empty, and arbitrarily in otherwise;
and set = {faqq0 : a Q, q, q 0 Q r }, so that is a countable subset of . If x D and b < g(x), there
is an a Q such that b a < g(x); there is an f such that f(x) > a; because f is continuous, there
are q, q 0 Qr such that q x q 0 and f(y) a whenever y D [q, q 0 ]; so that f aqq0 , aqq0 6= ,
faqq0 aqq0 and supf f (x) faqq0 (x) b. As b is arbitrary, g(x) = supf f (x). Q Q
Let hfn inN be a sequence running over . Because is upwards-directed, we can choose hfn0 inN in
0
inductively in such a way that fn+1 max(fn0 , fn ) for every n N. So hfn0 inN is a non-decreasing sequence
in and supnN fn0 (x) supf f (x) = g(x) for every x D. By B.Levis theorem,
R R R
D
g d supnN D
fn0 d supf D
f d,
and we have the required inequality.
282 Product measures 256X

256X Basic exercises > (a) Let be a measure on R r . (i) Show that it is locally finite, in the sense of
256Ab, iff for every x R r there is a > 0 such that B(x, ) < . (Hint: the sets B(0, n) are compact.)
(ii) Show that in this case is -finite.
> (b) Let beSa Radon measure on R r and G a non-empty upwards-directed
S family of open sets in R r .
(i) Show that ( G) = supGGS G. (Hint: observe that if K G is compact, then K G for some
G G.) (ii) Show that (E G) = supGG (E G) for every set E which is measured by .
> (c) Let be a Radon measure on R r and T F a non-empty downwards-directed family of closed sets in
r
R such that inf F F F < . (i)
T Show that ( F) = inf F F F . (Hint: apply 256Xb(ii) to G = {R r \ F :
F F}.) (ii) Show that (E F) = inf F F (E F ) for every E in the domain of .
> (d) Show that a Radon measure on R r is atomless iff {x} = 0 for every x R r . (Hint: apply 256Xc
with F = {F : F E is closed, not negligible}.)

(e) Let 1 , 2 be Radon measures on Rr , and 1 , 2 ]0, [. Set = dom 1 dom 2 , and for E
set E = 1 1 E + 2 2 E. Show that is a Radon measure on R r . Show that is an indefinite-integral
measure over Lebesgue measure iff 1 , 2 are, and that in this case a linear combination of of Radon-Nikodym
derivatives of 1 and 2 is a Radon-Nikodym derivative of .
> (f ) Let be a Radon measure on R r . (i) Show that there is a unique closed set F R r such that, for
open sets G R r , G > 0 iff G F 6= . (F is called the support of .) (ii) Generally, a set A R r is
called self-supporting if (A G) > 0 whenever G R r is an open set meeting A. Show that for every
closed set F R r there is a unique self-supporting closed set F 0 F such that (F \ F 0 ) = 0.
> (g) Show that a measure on R is a Radon measure iff it is a Lebesgue-Stieltjes measure as described
in 114Xa. Show that in this case is an indefinite-integral measure over Lebesgue measure iff the function
x 7 ], x] is absolutely continuous on every bounded interval.
(h) Let be a Radon measure on R r . Let Ck be the space of continuous real-valued functions on R r with
bounded
R supports. Show that for every -integrable function f and every > 0 there is a g Ck such that
|f g|d . (Hint: use arguments from 242O, but in (a-i) of the proof there start with closed intervals
I.)

(i) Let be a Radon measure on R r . Show that E = inf{G : G E is open} for every set E in the
domain of .

(j) Let , 0 be two Radon measures on R r , and suppose that I = 0 I for every half-open interval I R r
(definition: 115Ab). Show that = 0 .
(k) Let be Cantor measure (256Hc). (i) Show that if Cn is the nth set used in the construction of
the Cantor set, so that Cn consists of 2n intervals of length 3n , then I = 2n for each of the intervals
N N
I composingPCn .n(ii) Let be the usual Nmeasure on {0, 1} (254J). Define : {0, 1} R by setting
2 N
(x) = 3 n=0 3 x(n) for each x {0, 1} . Show that is a bijection between {0, 1} and C. (iii) Show
that if B is the Borel -algebra of R, then {1 [E] : E B} is precisely the -algebra of subsets of {0, 1}N
generated by the sets {x : x(n) = i} for n N, i {0, 1}. (iv) Show that is an isomorphism between
({0, 1}N , ) and (C, C ), where C is the subspace measure on C induced by .
(l) Let and 0 be two Radon measures on R r . Show that 0 is an indefinite-integral measure over iff
E = 0R whenever E = 0, and in this case a function f is a Radon-Nikodym derivative of 0 with respect
0

to iff E f d = 0 E for every Borel set E.

256Y Further exercises (a) Let be a Radon measure on R r , and X any subset of Rr ; let X be
the subspace measure on X and X its domain, and give X its subspace topology (2A3C). Show that X
has the following properties: (i) X is complete and locally determined; (ii) every open subset of X belongs
to X ; (iii) X E = sup{X F : F E is closed in SX} for every E X ; (iv) whenever G is a non-empty
upwards-directed family of open subsets of X, X ( G) = supGG X G; (v) every point of X belongs to an
open set of finite measure.
256 Notes Radon measures on R r 283

(b) Let be a Radon measure on R r , with domain , and f : Rr R a function. Show that the
following are equiveridical: (i) f is -measurable; (ii) for every non-negligible set E there is a non-
negligible F such that F E and f F is continuous; (iii) for every set E , E = supKKf ,KE K,
where Kf = {K : K R r is compact, f K is continuous}. (Hint: for S (ii)(i), take a maximal disjoint
family E {K : K Kf , K > 0}; show that E is countable and that E is conegligible.)

(c) Take , X, X and X as in 256Ya. Suppose that f : X R is a function. Show that f is X -


measurable iff for every non-negligible measurable set E X there is a non-negligible measurable F E
such that f F is continuous.

(d) Let hn inN be a sequence of Radon measures on R r . Show that there is a Radon measure on
r
R such that every n is anPindefinite-integral measure over . (Hint: find aPsequence hn inN of strictly

positive numbers such that n=0 n n B(0, k) < for every k, and set = n=0 n n , using the idea of
256Xe.)

(e) A set G R N is open if for every x G there are n N, > 0 such that
{y : y R N , |y(i) x(i)| < for every i n} G.
The Borel -algebra of R N is the -algebra B of subsets of R N generated, in the sense of 111Gb, by the
family T of open sets. (i) Show that T is a topology (2A3A). (ii) Show that a filter F on R N converges to
x R N iff i [[F]] x(i) for every i N, where i (y) = y(i) for i N, y R N . (iii) Show that B is the
-algebra generated by sets of the form {x : x R N , x(i) a}, where i runs through N and a runs through
R. (iv) Show that if i 0 for every i N, then {x : |x(i)| i i N} is compact. (Hint: 2A3R.)
(v) Show that any open set in R N is the union of a sequence of closed sets. (Hint: look at sets of the form
{x : qi x(i) qi0 i n}, where qi , qi0 Q for i n.) (vi) Show that if 0 is any probability measure
with domain B, then its completion is inner regular with respect to the compact sets, and therefore may
be called a Radon measure on R N . (Hint: show that there are compact sets of measure arbitrarily close to
1, and therefore that every open set, and every closed set, includes a K set of the same measure.)

256 Notes and comments Radon measures on Euclidean spaces are very special, and the results of this
section do not give clear pointers to the direction the theory takes when applied to other kinds of topological
space. With the material here you could make a stab at developing a theory of Radon measures on separable
complete metric spaces, provided you use 256Xa as the basis for your definition of locally finite. These
are the spaces for which a version of 256C is true. (See 256Ye.) But for generalizations to other types of
topological space, and for the more interesting parts of the theory on R r , I must ask you to wait for Volume
4. My purpose in introducing Radon measures here is strictly limited; I wish only to give a basis for 257
and 271 sufficiently solid not to need later revision. In fact I think that all we really need are the Radon
probability measures.
The chief technical difficulty in the definition of Radon measure here lies in the insistence on complete-
ness. It may well be that for everything studied in this volume, it would be simpler to look at locally finite
measures with domain the algebra of Borel sets. This would involve us in a number of circumlocutions
when dealing with Lebesgue measure itself and its derivates, since Lebesgue measure is defined on a larger
-algebra; but the serious objection arises in the more advanced theory, when non-Borel sets of various
kinds become central. Since my aim in this book is to provide secure foundations for the study of all aspects
of measure theory, I ask you to take a little extra trouble now in order to avoid the possibility of having
to re-work all your ideas later. The extra trouble arises, for instance, in 256D, 256Xe and 256Xj; since
different Radon measures are defined on different -algebras, we have to check that two Radon measures
which agree on the compact sets, or on the open sets, have the same domains. On the credit side, some of
the power of 256G arises from the fact that the Radon image measure 1 is defined on the whole -algebra
{F : 1 [F ] dom()}, not just on the Borel sets.
The further technical point that Radon measures are expected to be locally finite gives less difficulty;
its effect is that from most points of view there is little difference between a general Radon measure and
a totally finite Radon measure. The extra condition which obviously has to be put into the hypotheses of
such results as 256E and 256G is no burden on either intuition or memory.
284 Product measures 256 Notes

In effect, we have two definitions of Radon measures on Euclidean spaces: they are the inner regular
locally finite topological measures, and they are also the completions of the locally finite Borel measures.
The equivalence of these definitions is Theorem 256C. The latter definition is the better adapted to 256K,
and the former to 256G. The inner regularity of the basic definition refers to compact sets; we also have
forms of inner regularity with respect to closed sets (256Bb) and K sets (256Bc), and a complementary
notion of outer regularity with respect to open sets (256Xi).

257 Convolutions of measures


The ideas of this chapter can be brought together in a satisfying way in the theory of convolutions of
Radon measures, which will be useful in 272 and again in 285. I give just the definition (257A) and the
central property (257B) of the convolution of totally finite Radon measures, with a few corollaries and a
note on the relation between convolution of functions and convolution of measures (257F).

257A Definition Let r 1 be an integer and 1 , 2 two totally finite Radon measures on Rr . Let
be the product measure on R r R r ; then is also a (totally finite) Radon measure, by 256K. Define
: R r R r R r by setting (x, y) = x + y; then is continuous, therefore measurable in the sense of
256G. The convolution of 1 and 2 , 1 2 , is the image measure 1 ; by 256G, this is a Radon measure.
Note that if 1 and 2 are Radon probability measures, then and 1 2 are also probability measures.

257B Theorem Let r 1 be an integer, and 1 and 2 two totally finite Radon measures on R r ; let
= 1 2 be their convolution, and their product on R r R r . Then for any real-valued function h
defined on a subset of R r ,
R R
h(x + y)(d(x, y)) exists = h(x)(dx)
if either integral is defined in [, ].
proof Apply 235L with J(x, y) = 1, (x, y) = x + y for all x, y R r .

257C Corollary Let r 1 be an integer, and 1 , 2 two totally finite Radon measures on R r ; let
= 1 2 be their convolution, and their product on R r R r ; write for the domain of . Let h be a
-measurable function defined -almost everywhere in R r . Suppose that any one of the integrals
RR RR R
|h(x + y)|1 (dx)2 (dy), |h(x + y)|2 (dy)1 (dx), h(x + y)(d(x, y))
exists and is finite. Then h is -integrable and
R RR RR
h(x)(dx) = h(x + y)1 (dx)2 (dy) = h(x + y)2 (dy)1 (dx).

proof Put 257B together with Fubinis and Tonellis theorems (252H).

257D Corollary If 1 and 2 are totally finite Radon measures on R r , then 1 2 = 2 1 .


proof For any Borel set E R r , apply 257C to h = E to see that
ZZ ZZ
(1 2 )(E) = E(x + y)1 (dx)2 (dy) = E(x + y)2 (dy)1 (dx)
ZZ
= E(y + x)2 (dy)1 (dx) = (2 1 )(E).

Thus 1 2 and 2 1 agree on the Borel sets of R r ; because they are both Radon measures, they must
be identical (256D).

257E Corollary If 1 , 2 and 3 are totally finite Radon measures on R r , then (1 2 )3 = 1 (2 3 ).


proof For any Borel set E R r , apply 257B to h = E to see that
257Xe Convolutions of measures 285

ZZ
((1 2 ) 3 )(E) = E(x + z)(1 2 )(dx)3 (dz)
ZZZ
= E(x + y + z)1 (dx)2 (dy)3 (dz)

(because x 7 E(x + z) is Borel measurable for every z)


ZZ
= E(x + y)1 (dx)(2 3 )(dy)
R
(because (x, y) 7 E(x + y) is Borel measurable, so y 7 E(x + y)1 (dx) is (2 3 )-integrable)
= (1 (2 3 ))(E).

Thus (1 2 ) 3 and 1 (2 3 ) agree on the Borel sets of R r ; because they are both Radon measures,
they must be identical.

257F Theorem Suppose that 1 and 2 are totally finite Radon measures on R r which are indefinite-
integral measures over Lebesgue measure . Then 1 2 is also an indefinite-integral measure over ; if f1
and f2 are Radon-Nikodym derivatives of 1 , 2 respectively, then f1 f2 is a Radon-Nikodym derivative of
1 2 .
R
proof By 255H (see the remark in 255L), f1 f2 is integrable with respect to , with f1 f2 d = 1, and
of course f1 f2 is non-negative. If E R r is a Borel set,

Z ZZ
f1 f2 d = E(x + y)f1 (x)f2 (y)(dx)(dy)
E
(by 255G)
ZZ
= E(x + y)f2 (y)1 (dx)(dy)

(because x 7 E(x + y) is Borel measurable)


ZZ
= E(x + y)1 (dx)2 (dy)
R
(because (x, y) 7 E(x + y) is Borel measurable, so y 7 E(x + y)1 (dx) is 2 -integrable)
= (1 2 )(E).

So f1 f2 is a Radon-Nikodym derivative of with respect to , by 256J.

257X Basic exercises > (a) Let r 1 be an integer. Let 0 be the Radon probability measure on R r
such that 0 {0} = 1. Show that 0 = for every totally finite Radon measure on R r .

(b) Let and R be totally finite Radon measures on R r , and E any set measured by their convolution
. Show that (E y)(dy) is defined in [0, ] and equal to ( )(E).

(c) Let 1 , . . . , n be totally finite Radon measures on R r , and let be the convolution 1 . . . n (using
257E to see that such a bracketless expression is legitimate). Show that
R R R
h(x)(dx) = . . . h(x1 + . . . + xn )1 (dx1 ) . . . n (dxn )
for every -integrable function h.

(d) Let 1 and 2 be totally finite Radon measures on R r , with supports F1 , F2 (256Xf). Show that the
support of 1 2 is {x + y : x F1 , y F2 }.

>(e) Let 1 and 2 be totally finite Radon measures on R r , and suppose that 1 has a Radon-Nikodym
derivative f with
R respect to Lebesgue measure . Show that 1 2 has a Radon-Nikodym derivative g,
where g(x) = f (x y)2 (dy) for -almost every x R r .
286 Product measures 257Xf

(f ) Suppose that 1 , 2 , 10 and 20 are totally finite Radon measures on R r , and that 10 , 20 are absolutely
continuous with respect to 1 , 2 respectively. Show that 10 20 is absolutely continuous with respect to
1 2 .

257Y Further exercises (a) Let M be the space of countably additive functionals defined on the
algebra B of Borel subsets of R, with its norm kk = ||(R) (see 231Yh). (i) Show that we have a unique
bilinear operator : M M M such that (1 B) (2 B) = (1 2 )B for all totally finite Radon
measures 1 , 2 on R. (ii) Show that is commutative and associative. (iii) Show that k1 2 k k1 kk2 k
for all 1 , 2 M , so that M is a Banach algebra under this multiplication. (iv) Show that M has a
multiplicative identity. (v) Show that L1 () can be regarded as a closed subalgebra of M , where is
Lebesgue measure on R r (cf. 255Xc).
(b) Let us say that a Radon measure on ], ] is a measure , with domain , on ], ] such that
(i) every Borel subset of ], ] belongs to (ii) for every E there are Borel sets E1 , E2 such that
E1 E E2 and (E2 \ E1 ) = 0 (iii) every compact subset of ], ] has finite measure. Show that for
any two totally finite Radon measures 1 , 2 on ], ] there is a unique totally finite Radon measure on
], ] such that
R R
h(x)(dx) = h(x +2 y)1 (dx)2 (dy)
for every -integrable function h, where +2 is defined as in 255Ma.

257 Notes and comments Of course convolution of functions and convolution of measures are very closely
connected; the obvious link being 257F, but the correspondence between 255G and 257B is also very marked.
In effect, they give us the same notion of convolution u v when u, v are positive members of L1 and u v
is interpreted in L1 rather than as a function (257Ya). But we should have to go rather deeper than the
arguments here to find ideas in the theory of convolution of measures to correspond to such results as 255K.
I will return to questions of this type in 444 in Volume 4.
All the theorems of this section can be extended to general abelian locally compact Hausdorff topological
groups; but for such generality we need much more advanced ideas (see 444), and for the moment I leave
only the suggestion in 257Yb that you should try to adapt the ideas here to ], ] or S 1 .
261A Vitalis theorem in R r 287

Chapter 26
Change of Variable in the Integral
I suppose most courses on basic calculus still devote a substantial amount of time to practice in the tech-
niques of Rintegrating standard
R functions. Surely the most powerful single technique is that of substitution:
replacing g(y)dy by g((x))0 (x)dx for an appropriate function . At this level one usually concentrates
on the skills of guessing at appropriate and getting the formulae right. I will not address such questions
here, except for rare special cases; in this book I am concerned rather with validating the process. For
functions of one variable, it can usually be justified by an appeal to the fundamental theorem of calculus,
and for any particular case I would normally go first to 225 in the hope that the results there would cover
it. But for functions of two or more variables some much deeper ideas are necessary.
I have already treated the general problem of integration-by-substitution
R R in abstract measure spaces in
235. There I described conditions under which g(y)dy = g((x))J(x)dx for an appropriate function
J. The context there gave very little scope for suggestions as to how to compute J; at best, it could be
presented as a Radon-Nikodym derivative (235O). In this chapter I give a form of the fundamental theorem
for the case of Lebesgue measure, in which is a more or less differentiable function between Euclidean
spaces, and J is a Jacobian, the modulus of the determinant of the derivative of (263D). This necessarily
depends on a serious investigation of the relationship between Lebesgue measure and geometry. The first
step is to establish a form of Vitalis theorem for r-dimensional space, together with r-dimensional density
theorems; I do this in 261, following closely the scheme of 221 and 223 above. We need to know quite a
lot about differentiable functions between Euclidean spaces, and it turns out that the theory is intertwined
with that of Lipschitz functions; I treat these in 262.
In the last two sections of the chapter, I turn to a separate problem for which some of the same techniques
turn out to be appropriate: the description of surface measure on (smooth) surfaces in Euclidean space,
like the surface of a cone or sphere. I suppose there is no difficulty in forming a robust intuition as to
what is meant by the area of such a surface and of suitably simple regions within it, and there is a very
strong presumption that there ought to be an expression for this intuition in terms of measure theory as
presented in this book; but the details are not I think straightforward. The first point to note is that for
any calculation of the area of a region G in a surface S, one would always turn at once to a parametrization
of the region, that is, a bijection : D G from some subset D of Euclidean space. But obviously one
needs to be sure that the result of the calculation is independent of the parametrization chosen, and while it
would be possible to base the theory on results showing such independence directly, that does not seem to
me to be a true reflection of the underlying intuition, which is that the area of simple surfaces, at least, is
something intrinsic to their geometry. I therefore see no acceptable alternative to a theory of r-dimensional
measure which can be described in purely geometric terms. This is the burden of 264, in which I give the
definition and most fundamental properties of Hausdorff r-dimensional measure in Euclidean spaces. With
this established, we find that the techniques of 261-263 are sufficient to relate it to calculations through
parametrizations, which is what I do in 265.

261 Vitalis theorem in R r


The main aim of this section is to give r-dimensional versions of Vitalis theorem and Lebesgues Density
Theorem, following ideas already presented in 221 and 223.

261A Notation For most of this chapter, we shall be dealing with the geometry and measure of
Euclidean space; it will save space to fix some notation.
Throughout this section and the two following, r 1 will be an integer. I will use Roman letters for
members of R r and Greek letters for their coordinates, so that a = (1 , . . . , r ), etc.; if you see any Greek
letter with a subscript you should look first for a nearby vector of which it might be a coordinate. The
measure under consideration will nearly always be Lebesgue measure on Rr ; so unless otherwise R indicated
should be interpreted as Lebesgue measure, and as Lebesgue outer measure. Similarly, . . . dx will
always be integration with respect to Lebesgue measure (in a dimension determined by the context).
288 Change of variable in the integral 261A
p
For x = (1 , . . . , r ) R r , write kxk = 12 + . . . + r2 . Recall that kx + yk kxk + kyk (1A2C) and that
kxk = ||kxk for any vectors x, y and scalar .
I will use the same notation as in 115 for intervals, so that, in particular,
[a, b[ = {x : i i < i i r},

]a, b[ = {x : i < i < i i r},

[a, b] = {x : i i i i r}
whenever a, b R r .
0 = (0, . . . , 0) will be the zero vector in R r , and 1 will be (1, . . . , 1). If x R r and > 0, B(x, ) will be
the closed ball with centre x and radius , that is, {y : y R r , ky xk }. Note that B(x, ) = x+B(0, );
so that by the translation-invariance of Lebesgue measure we have
B(x, ) = B(0, ) = r r ,
where

1 k
r = if r = 2k is even,
k!
22k+1 k! k
= if r = 2k + 1 is odd
(2k+1)!

(252Q).

261B Vitalis theorem in R r Let A R r be any set, and I a family of closed non-trivial (that
is, non-singleton, or, equivalently, non-negligible) balls in R r such that every point of A is S
contained in
arbitrarily small members of I. Then there is a countable disjoint set I0 I such that (A \ I0 ) = 0.
proof (a) To begin with (down to the end of (f) below), suppose that kxk < M for every x A, and set
I 0 = {I : I I, I B(0, M )}.
S
If there is a finite disjoint set I0 I 0 such that A I0 (including the possibility that A = I0 = ), we
can stop. So let us suppose henceforth that there is no such I0 .
(b) In this case, if I0 is any finite
S disjoint subset of I 0 , there is a J I 0 which is disjoint from any
member of I0 . PP Take x A \ I0 . Because every member of I0 is closed, there is a > 0 such that
B(x, ) does not meet any member of I0 , and as kxk < M we can suppose that B(x, S ) B(0, M ). Let J
be a member of I, containing x, and of diameter at most ; then J I 0 and J I0 = . Q Q
(c) We can therefore choose a sequence hn inN of real numbers and a disjoint sequence hIn inN in I 0
inductively, as follows. Given hIj ij<n (if n = 0, this is the empty sequence, with no members), with Ij I 0
for each j < n, and Ij Ik = for j < k < n, set Jn = {I : I I 0 , I Ij = for every j < n}. We know
from (b) that Jn 6= . Set
n = sup{diam I : I Jn };
then n 2M , because every member of Jn is included in B(0, M ). We can therefore find a set In Jn
such that diam In 21 n , and this continues the induction.
(e) Because the In are disjoint measurable subsets of the bounded set B(0, M ), we have
P
n=0 In B(0, M ) < ,

and limn In = 0. Also In r ( 41 n )r for each n, so limn n = 0.


Now define In0 to be the closed ball with the same centre as In but five timesSthe diameter,
S so that it
contains every point within a distance n of In . I claim that, for any n, A j<n Ij jn Ij0 . P P??
S S 0
Suppose, if possible, otherwise. Take any x A \ ( j<n Ij jn Ij ). Let > 0 be such that
S
B(x, ) B(0, M ) \ j<n Ij ,
and let J I be such that x J B(x, ). Then
261C Vitalis theorem in R r 289

limm m = 0 < diam J


(this is where we use the hypothesis that all the balls in I are non-trivial); let m be the least integer greater
than or equal to n such that m < diam J. In this case J cannot belong to Jm , so there must be some
k < m such that J Ik 6= , because certainly J I 0 . By the choice of , k cannot be less than n, so
n k < m, and k diam J. So the distance from x to the nearest point of Ik is at most diam J k .
But this means that x Ik0 ; which contradicts the choice of x. XXQQ
(f ) It follows that
S S P P
(A \ j<n Ij ) ( 0
jn Ij ) j=n Ij0 5r j=n Ij .
As
P
j=0 Ij B(0, M ) < ,
S
limn (A \ j<n Ij ) = 0 and
S S
(A \ jN Ij ) = (A \ jN Ij ) = 0.
Thus
S in this case we may set I0 = {In : n N} to obtain a countable disjoint family in I with
(A \ I0 ) = 0.
(g) This completes the proof if A is bounded. In general, set
Un = {x : x R r , n < kxk < n + 1}, An = A Un , Jn = {I : I I, I Un },
for each n N. Then for each n we see that every point S of0 An belongs to arbitrarily small members of Jn ,
0
so there
S is a countable disjoint Jn J n such that A n \ Jn is negligible. Now (because the Un are disjoint)
I0 = nN Jn0 is disjoint, and of course I0 is a countable subset of I; moreover,
S S S S
A \ I0 (R r \ nN Un ) nN (An \ Jn0 )
S
is negligible. (To see that Rr \ nN Un = {x : kxk N} is negligible, note that for any n N the set
{x : kxk = n} B(0, n) \ B(0, n)
has measure at most r nr r (n)r for every [0, 1[, so must be negligible.)

261C Just as in 223, we can use the r-dimensional Vitali theorem to prove theorems on the approxi-
mation of functions by their local mean values.
Density Theorem in R r : integral form Let D be a subset of R r , and f a real-valued function which is
integrable over D. Then
1 R
f (x) = lim0 DB(x,)
f d
B(x,)
for almost every x D.
proof (a) To begin with (down to the end of (b)), let us suppose that D = dom f = R r .
Take n N and q, q 0 Q with q < q 0 , and set
1 R
A = Anqq0 = {x : kxk n, f (x) q, lim sup0 B(x,)
f d > q 0 }.
B(x,)

?? Suppose, ifR possible, that A > 0. Let > 0 be such that (1 + |q|) < (q 0 q) A, and let ]0, ]

be such that E |f | whenever E (225A). Let G A be an openR set of measure at most A +


1
(134Fa). Let I be the set of non-trivial closed balls B G such that B B
f d q 0 . Then every point of
A is contained in (indeed, isSthe centre of) arbitrarily small
S members of I. So there is a countable disjoint
set I0 I such
R that (A \ I 0 ) = 0, by 261B; set H = I0 .
Because I f d q 0 I for each I I0 , we have
R P R P
H
f d = II0 I f d q 0 II0 I = q 0 H q 0 A.
Set
E = {x : x G, f (x) q}.
290 Change of variable in the integral 261C

Then E is measurable, and A E G; so


A E G A + A + .
Also
(H \ E) G E ,
R
so by the choice of , H\E
f and

Z Z
f + f + q(H E)
H HE
+ q A + |q|((H E) A) q A + (1 + |q|)

(because A = (A H) (H E))
Z
< q 0 A f,
H

which is impossible. X
X
Thus Anqq0 is negligible. This is true for all q < q 0 and all n, so
S S
A = q,q0 Q,q<q0 nN Anqq0
is negligible. But
1 R
f (x) lim sup0 f
B(x,) B(x,)
for every x R r \ A , that is, for almost all x R r .
(b) Similarly, or applying this result to f .
1 R
f (x) lim inf 0 f
B(x,) B(x,)
for almost every x, so
1 R
f (x) = lim0 f
B(x,) B(x,)
for almost every x.
(c) For the (superficially) more general case enunciated inR the theorem,
R let f be a -integrable function

extending f D, defined everywhere on R , and such that F f = DF f for every measurable F R r
r

(applying 214Eb to f D). Then


1 R 1 R
f (x) = f(x) = lim0 B(x,)
f = lim0 DB(x,)
f
B(x,) B(x,)
for almost every x D.

261D Corollary (a) If D R r is any set, then


(DB(x,))
lim0 =1
B(x,)
for almost every x D.
(b) If E R r is a measurable set, then
(EB(x,))
lim0 = E(x)
B(x,)
for almost every x R r .
(c) If D R r and f : D R is any function, then for almost every x D,
({y:yD, |f (y)f (x)|}B(x,))
lim0 =1
B(x,)
for every > 0.
261D Vitalis theorem in R r 291

(d) If D R r and f : D R is measurable, then for almost every x D,


({y:yD, |f (y)f (x)|}B(x,))
lim0 =0
B(x,)
for every > 0.
proof (a) Apply 261C with f = B(0, n) to see that, for any n N,
(DB(x,))
lim0 =1
B(x,)

for almost every x D with kxk < n.


(b) Apply (a) to E to see that
(EB(x,))
lim inf 0 E(x)
B(x,)

for almost every x R r , and to E 0 = R r \ E to see that


(EB(x,)) (E 0 B(x,))
lim sup0 = 1 lim inf 0 1 E 0 (x) = E(x)
B(x,) B(x,)
for almost every x.
(c) For q, q 0 Q, set
Dqq0 = {x : x D, q f (x) q 0 },

(Dqq0 B(x,))
Cqq0 = {x : x Dqq0 , lim0 = 1};
B(x,)
now set
S
C =D\ q,q 0 Q (Dqq
0 \ Cqq0 ),
so that D \ C is negligible. If x C and > 0, then there are q, q 0 Q such that f (x) q f (x)
q 0 f (x) + , and now
{y:yDB(x,), |f (y)f (x)|} (Dqq0 B(x,))
lim inf 0 lim inf 0 = 1,
B(x,) B(x,)
so
{y:yDB(x,), |f (y)f (x)|}
lim0 = 1.
B(x,)

(d) Define C as in (c). We know from (a) that (D \ C 0 ) = 0, where


(DB(x,))
C 0 = {x : x D, lim0 = 1}.
B(x,)

If x C C 0 and > 0, we know from (c) that


{y:yDB(x,), |f (y)f (x)|/2}
lim0 = 1.
B(x,)
But because f is measurable, we have

{y : y D B(x, ), |f (y) f (x)| }


1
+ {y : y D B(x, ), |f (y) f (x)| } (D B(x, ))
2

for every > 0. Accordingly

{y:yDB(x,), |f (y)f (x)|}


lim sup
0 B(x,)

(DB(x,)) {y:yDB(x,), |f (y)f (x)|/2}


lim lim = 0,
0 B(x,) 0 B(x,)
292 Change of variable in the integral 261D

and
{y:yDB(x,), |f (y)f (x)|}
lim0 =0
B(x,)

for every x C C 0 , that is, for almost every x D.

261E Theorem Let D be a subset of R r , and f a real-valued function which is integrable over D. Then
1 R
lim0 DB(x,)
|f (y) f (x)|dy = 0
B(x,)
for almost every x D.
proof (Compare 223D.)
(a) Suppose first that D is bounded. For each q Q, set gq (x) = |f (x) q| for x D dom f ; then g is
integrable over D, and
1 R
lim0 g = gq (x)
DB(x,) q
B(x,)
for almost every x D, by 261C. Setting
1 R
Eq = {x : x D dom f, lim0 gq = gq (x)},
B(x,) DB(x,)
T
we have D \ Eq negligible for every q, so D \ E is negligible, where E = qQ Eq . Now
1 R
lim0 |f (y) f (x)|dy =0
B(x,) DB(x,)
for every x E. P
P Take x E and > 0. Then there is a q Q such that |f (x) q| , so that
|f (y) f (x)| |f (y) q| + = gq (y) +
for every y D dom f , and
Z Z
1 1
lim sup |f (y) f (x)|dy lim sup gq (y) + dy
0 B(x,) DB(x,) 0 B(x,) DB(x,)

= + gq (x) 2.
As is arbitrary,
1 R
lim0 |f (y) f (x)|dy = 0,
B(x,) DB(x,)
as required. Q
Q
(b) For unbounded sets D, apply (a) to D B(0, n) for each n N.
Remark The set
1 R
{x : x dom f, lim0 |f (y) f (x)|dy = 0}
B(x,) DB(x,)
is sometimes called the Lebesgue set of f .

261F Another very useful consequence of 261B is the following.


Proposition Let A R r be any set,Sand > 0. Then P there is a sequence hBn inN of closed balls in R r ,

all of radius at most , such that A nN Bn and n=0 Bn A + . Moreover, we may suppose that
the balls in the sequence whose centres do not lie in A have measures summing to at most .
proof (a) Set r = B(0, 1). The first step is the obvious remark that if x R r , > 0 then the half-open

cube I = [x, x + 1[ is a subset of the ball B(x, r), which has measure r r = r I, where r = r rr/2 .
r
It follows that if G R is any open set, then G can be covered by a sequence of balls of total measure at
most r G. P P If G is empty, we can take all the balls to be singletons. Otherwise, for each k N, set

Qk = {z : z Zr , 2k z, 2k (z + 1) G},
261Yc Vitalis theorem in R r 293

S k
Ek = zQk 2 z, 2k (z + 1 ).
Then hEk ikN is a non-decreasing sequence of sets with union G, and E0 and each of the differences Ek+1 \Ek
is expressible as a disjoint union of half-open cubes. Thus G also is expressible as a disjoint unionS of a
sequence hIn inN of half-open cubes. Each In is covered by a ball Bn of measure r In ; so that G nN Bn
and
P P
n=0 Bn r n=0 In = r G. Q Q

(b) It follows at once that if A = 0 then for any > 0 there is a sequence hBn inN of balls covering A
of measures summing to at most , because there is certainly an open set including A with measure at most
/r .
(c) Now take any set A, and > 0. Let G A be an open set with G A + 21 . Let I be the family
of non-trivial closed balls included in G, of radius at most and with centres in A. Then every point
S of A
belongs to arbitrarily small members of I, so there
S is a countable
P disjoint I 0 I such that (A \ I0 ) = 0.
Let hBn0 inN be a sequence of balls covering A \ I0 with n=0 Bn0 min( 21 , r r ); these
S surely all have
radius at most . Let hBn inN be a sequence amalgamating I0 with hBn0 inN ; then A nN Bn , every Bn
has radius at most and
P P P 0 1
n=0 Bn = BI0 B + n=0 Bn G + A + ,
2

while the Bn whose centres do not lie in A must come from the sequence hBn0 inN , so their measures sum
to at most 21 .
Remark In fact we can (if A is not empty) arrange that the centre of every Bn belongs to A. This is an
easy consequence of Besicovitchs Covering Lemma (see 472 in Volume 4).

261X Basic exercises (a) Show that 261C and 261E are valid for any locally integrable real-valued
function f ; in particular, for any f Lp (D ) for any p 1, writing D for the subspace measure on D.

(b) Show that 261C, 261Dc, 261Dd and 261E are valid for complex-valued functions f .

> (c) Take three disks in the plane, each touching the other two, so that they enclose an open region R
with three cusps. In R let D be a disk tangent to each of the three original disks, and R0 , R1 , R2 the three
components of R \ D. In each Rj let Dj be a disk tangent to each of the disks bounding Rj , and Rj0 , Rj1 ,
Rj2 the three components of Rj \ Dj . Continue, obtaining 27 regions at the next step, 81 regions at the
next, and so on.
Show that the total area of the residual regions converges to zero as the process continues indefinitely.
(Hint: compare with the process in the proof of 261B.)

261Y Further exercises (a) Formulate an abstract definition of Vitali cover, meaning a family of sets
satisfying the conclusion of 261B in some sense, and corresponding generalizations of 261C-261E, covering
(at least) (b)-(d) below.

(b) For x R r , k N let C(x, k) be the half-open cube of the form 2k z, 2k (z + 1) , with z Zr ,
containing x. Show that if f is an integrable function on R r then
R
limk 2kr C(x,k) f = f (x)
for almost every x R r .

(c) Let f be a real-valued function which is integrable over R r . Show that


1 R
lim0 r [x,x+1[ f = f (x)

r
for almost every x R .
294 Change of variable in the integral 261Yd

(d) Give X = {0, 1}N its usual measure (254J). For x X, k N set C(x, k) = {y : y X, R y(i) = x(i)
for i < k}. Show that if f is any real-valued function which is integrable over X then limk 2k C(x,k) f d =
R
f (x), limk 2k C(x,k) |f (y) f (x)|(dy) = 0 for almost every x X.

(e) Let f be a real-valued function which is integrable over R r , and


R x a point in the Lebesgue set of
f . Show that for every > 0 there is a > 0 such that R |f (x) f (x y)g(kyk)dy|
R whenever
g : [0, [ [0, [ is a non-increasing function such that R r g(kyk)dy = 1 and B(0,) g(kyk)dy 1 .
(Hint: 223Yg.)

(GB(x,))
(f ) Let T be the family of those measurable sets G R r such that lim0 = 1 for every
B(x,)
x G. Show that T is a topology on R , the density topology of R . Show that a function f : R r R
r r

is measurable iff it is T-continuous at almost every point of Rr .


(y,A)
(g) A set A R r is said to be porous at x R r if lim supyx > 0, writing (y, A) = inf zA ky zk
kyxk
(or if A is empty). Show that if A is porous at all its points then it is negligible.

(h) Let A R r be a bounded set and I a non-empty family of non-trivial closed Pn balls covering A. Show
that for any > 0 there are disjoint B0 , . . . , Bn I such that A (3 + )r k=0 Bk .

(i) Let (X, ) be a metric space and A X any set, x 7 x : A [0, [ any bounded function.
Show
S that if >S3 then there is an A0 A such that (i) (x, y) > x + y for all distinct x, y A0 (ii)
xA B(x, x ) xA0 B(x, x ), writing B(x, ) for the closed ball {y : (y, x) }.

(j) Show that any union of non-trivial closed balls in R r is Lebesgue measurable. (Hint: induce on r.
Compare 415Ye in Volume 4.)

(k) Suppose that A R r and that I is a family of closed subsets of R r such that
for every x A there is an > 0 such that for every > 0 there is an I I such that x I and
0 < (diam I)r I .
S
Show that there is a countable disjoint set I0 I such that A \ I0 is negligible.

261 Notes and comments In the proofs of 261B-261E above, I have done my best to follow the lines of
the one-dimensional case; this section amounts to a series of generalizations of the work of 221 and 223.
It will be clear that the idea of 261A/261B can be used on other shapes than balls. To make it work in
the form above, we need a family I such that there is a constant K for which
I 0 KI
for every I I, where we write
I 0 = {x : inf yI kx yk diam(I)}.
Evidently this will be true for many classes I determined by the shapes of the sets involved; for instance, if
E R r is any bounded set of strictly positive measure, the family I = {x + E : x R r , > 0} will satisfy
the condition.
In 261Ya I challenge you to find an appropriate generalization of the arguments depending on the con-
clusion of 261B.
Another way of using 261B is to say that because sets can be essentially covered by disjoint sequences
of balls, it ought to be possible to use balls, rather than half-open intervals, in the definition of Lebesgue
measure on R r . This is indeed so (261F). The difficulty in using balls in the basic definition comes right
at the start, in proving that if a ball is covered by finitely many balls then the sum of the volumes of
the covering balls is at least the volume of the covered ball. (There is a trick, using the compactness of
closed balls and the openness of open balls, to extend such a proof to infinite covers.) Of course you could
regard this fact as elementary, on the ground that Archimedes would have noticed if it werent true, but
nevertheless it would be something of a challenge to prove it, unless you were willing to wait for a version
of Fubinis theorem, as some authors do.
262B Lipschitz and differentiable functions 295

I have given the results in 261C-261E for arbitrary subsets D of Rr not because I have any applications in
mind in which non-measurable subsets are significant, but because I wish toRmake it possible to notice when
measurability matters. Of course it is necessary to interpret the integrals D f d in the way laid down in
214. The game is given away in part (c) of the proof of 261C,
R where R I rely on the fact that if f is integrable
over D then there is an integrable f : R r R such that F f = DF f for every measurable F R r . In
effect, for all the questions dealt with here, we can replace f , D by f, R r .
The idea of 261C is that, for almost every x, f (x) is approximated by its mean value on small balls
B(x, ), ignoring the missing values on B(x, ) \ (D dom f ); 261E is a sharper version of the same idea.
The formulae of 261C-261E mostly involve the expression B(x, ). Of course this is just r r . But I think
that leaving it unexpanded is actually more illuminating, as well as avoiding sub- and superscripts, since it
makes it clearer what these density theorems are really about. In 472 of Volume 4 I will revisit this material,
showing that a surprisingly large proportion of the ideas can be applied to arbitrary Radon measures on
R r , even though Vitalis theorem (in the form stated here) is no longer valid.

262 Lipschitz and differentiable functions


In preparation for the main work of this chapter in 263, I devote a section to two important classes of
functions between Euclidean spaces. What we really need is the essentially elementary material down to
262I, together with the technical lemma 262M and its corollaries. Theorem 262Q is not relied on in this
volume, though I believe that it makes the patterns which will develop more natural and comprehensible.

262A Lipschitz functions Suppose that r, s 1 and : D R s is a function, where D R r . We


say that is -Lipschitz, where [0, [, if
k(x) (y)k kx yk
p p
for all x, y D, writing kxk = 12 + . . . + r2 if x = (1 , . . . , r ) R r , kzk = 12 + . . . + s2 if z =
(1 , . . . , s ) R s . In this case, is a Lipschitz constant for .
A Lipschitz function is a function which is -Lipschitz for some 0. Note that in this case has
a least Lipschitz constant (since if A is the set of Lipschitz constants for , and 0 = inf A, then 0 is a
Lipschitz constant for ).

262B We need the following easy facts.


Lemma Let D R r be a set and : D R s a function.
(a) is Lipschitz iff i : D R is Lipschitz for every i, writing (x) = (1 (x), . . . , s (x)) for every
x D = dom R r .
(b) In this case, there is a Lipschitz function : R r R s extending .
(c) If r = s = 1 and D = [a, b] is an interval, then is Lipschitz iff it is absolutely continuous and has a
bounded derivative.
proof (a) For any x, y D and i s,

|i (x) i (y)| k(x) (y)k s supjs |j (x) j (y)|,
so any Lipschitz constant
for will be a Lipschitz constant for every i , and if j is a Lipschitz constant for
j for each j, then s supjs j will be a Lipschitz constant for .

(b) By (a), it is enough to consider the case s = 1, for if every i has a Lipschitz extension i , we can set
(x) = (1 (x), . . . , s (x)) for every x to obtain a Lipschitz extension of . Taking s = 1, then, note that
the case D = is trivial; so suppose that D 6= . Let be a Lipschitz constant for , and write
(z) = supyD (y) ky zk
for every z R . If x D, then, for any z R r and y D,
r

(y) ky zk (x) + ky xk ky zk (x) + kz xk,


296 Change of variable in the integral 262B

so that (z) (x) + kz xk; this shows, in particular, that (z) < . Also, if z D, we must have
(z) kz zk (z) (z) + kz zk,
so that extends . Finally, if w, z R r and y D,
(y) ky wk (y) ky zk + kw zk (z) + kw zk;
and taking the supremum over y D,
(w) (z) + kw zk.
As w and z are arbitrary, is Lipschitz.
Pn
(c)(i) Suppose that is -Lipschitz. If > 0 and a a1 b1 . . . an bn b and i=1 bi ai
/(1 + ), then
Pn Pn
i=1 |(bi ) (ai )| i=1 |bi ai | .

As is arbitrary, is absolutely continuous. If x [a, b] and 0 (x) is defined, then


|(y)(x)|
|0 (x)| = limyx ,
|yx|

so 0 is bounded.

(ii) Now suppose that is absolutely continuous and that |0 (x)| for every x dom 0 , where
0. Then whenever a x y b,
Ry Ry
|(y) (x)| = | x 0 | x |0 | (y x)
(using 225E for the first equality). As x and y are arbitrary, is -Lipschitz.

262C Remark The argument for (b) above shows that if : D R is a Lipschitz function, where
D Rr , then has an extension to R r with the same Lipschitz constants. In fact it is the case that if
: D R s is a Lipschitz function, then has an extension to : R r R s with the same Lipschitz
constants; this is Kirzbrauns theorem (Kirzbraun 34, or Federer 69, 2.10.43).

262D Proposition If : D R r is a -Lipschitz function, where D R r , then [A] r A for


every A D, where is Lebesgue measure on R r . In particular, [D A] is negligible for every negligible
set A R r .

sequence hBn inN = hB(xn , n )inN of closed balls in Rr , covering A,


proof LetP > 0. By 261F, there is aP

such that n=0 Bn A + and nN\K Bn , where K = {n : n N, xn A}. Set
L = {n : n N \ K, Bn D 6= },
and for n L choose yn D Bn . Now set

Bn0 = B((xn ), n ) if n K,
= B((yn ), 2n ) if n L,
= if n N \ (K L).
S
Then [Bn D] Bn0 for every n, so [D A] nN Bn0 , and


X X X
[A D] Bn0 = r Bn + 2r r Bn
n=0 nK nL
r r r
( A + ) + 2 .

As is arbitrary, [A D] r A, as claimed.
262G Lipschitz and differentiable functions 297

262E Corollary Let : D R r be an injective Lipschitz function, where D R r , and f a measurable


function from a subset of R r to R.
(a) If 1 is defined almost everywhere in a subset H of R r and f is defined almost everywhere in R r ,
then f 1 is defined almost everywhere in H.
(b) If E D is Lebesgue measurable then [E] is measurable.
(c) If D is measurable then f 1 is measurable.
proof Set
C = dom(f 1 ) = {y : y [D], 1 (y) dom f } = [D dom f ].

(a) Because f is defined almost everywhere, [D \ dom f ] is negligible. But now


C = [D] \ [D \ dom f ] = dom 1 \ [D \ dom f ],
so
H \ C (H \ dom 1 ) [D \ dom f ]
is negligible.
(b) Now suppose that E SD and that E is measurable. Let hFn inN be a sequence of closed bounded
subsets of E such that (E \ nN Fn ) = 0 (134Fb). Because is Lipschitz, it is continuous,Sso [Fn ] is
compact, therefore closed, therefore measurable for every n (2A2F, 2A2E, 115G); also [E \ nN Fn ] is
negligible, by 262D, therefore measurable. So
S S
[E] = [E \ nN Fn ] nN [Fn ]
is measurable.
(c) For any a R, take a measurable set E R r such that {x : f (x) a} = E dom f . Then
{y : y C, f 1 (y) a} = C [D E].
But [D E] is measurable, by (b), so {y : f 1 (y) a} is relatively measurable in C. As a is arbitrary,
f 1 is measurable.

262F Differentiability I come now to the class of functions whose properties will take up most of the
rest of the chapter.
Definitions Suppose r, s 1 and that is a function from a subset D = dom of R r to R s .
(a) is differentiable at x D if there is a real s r matrix T such that
k(y)(x)T (yx)k
limyx = 0;
kyxk

in this case we may write T = 0 (x).


(b) I will say that is differentiable relative to its domain at x, and that T is a derivative of at
x, if x D and for every > 0 there is a > 0 such that k(y) (x) T (y x)k ky xk for every
y B(x, ) D.

262G Remarks (a) The standard definition in 262Fa, involving an all-sided limit limyx , implicitly
requires to be defined on some non-trivial ball centred on x, so that we can calculate (y) (x) T (y x)
for all y sufficiently near x. It has the advantage that the derivative T = 0 (x) is uniquely defined (because
kT1 zT2 zk
if limz0 = 0 then
kzk
k(T1 T2 )zk kT1 (z)T2 (z)k
= lim0 =0
kzk kzk

for every non-zero z, so T1 T2 must be the zero matrix). For our purposes here, there is some advantage in
relaxing this slightly to the form in 262Fb, so that we do not need to pay special attention to the boundary
of dom .
298 Change of variable in the integral 262Gb

(b) If you have not seen this concept of differentiability before, but have some familiarity with partial
differentiation, it is necessary to emphasize that the concept of differentiable function (at least in the strict
sense demanded by 262Fa) is strictly stronger than the concept of partially differentiable function. For
purposes of computation, the most useful method of finding true derivatives is through 262Id below. For
a simple example of a function with a full set of partial derivatives, which is not everywhere differentiable,
consider : R 2 R defined by

1 2
(1 , 2 ) = if 12 + 22 6= 0,
12 + 22
= 0 if 1 = 2 = 0.

Then is not even continuous at 0, although both partial derivatives j are defined everywhere.

(c) In the definition above, I speak of a derivative as being a matrix. Properly speaking, the derivative
of a function defined on a subset of R r and taking values in R s should be thought of as a bounded linear
operator from R r to R s ; the formulation in terms of matrices is acceptable just because there is a natural
one-to-one correspondence between s r real matrices and linear operators from R r to R s , and all these
linear operators are bounded. I use the matrix description because it makes certain calculations more
direct; in particular, the relationship between 0 and the partial derivatives of (262Ic), and the notion of
the determinant det 0 (x), used throughout 263 and 265.

262H The norm of a matrix Some of the calculations below will rely on the notion of norm of a
matrix. The one I will use (in fact, for our purposes here, any norm would do) is the operator norm,
defined by saying
kT k = sup{kT xk : x R r , kxk 1}
for any s r matrix T . For the basic facts concerning these norms, see 2A4F-2A4G. The following will also
be useful.

(a) If all the coefficients of T are small, so is kT k; in fact, if T = hij iis,jr , and kxk 1, then |j | 1
for each j, so
Ps Pr
2 1/2
Ps Pr
2 1/2

kT xk = i=1 ( j=1 ij j ) i=1 ( j=1 |ij |) r s maxis,jr |ij |,

and kT k r s maxis,jr |ij |. (This is a singularly crude inequality. A better one is in 262Ya. But it tells
us, in particular, that kT k is always finite.)

(b) If kT k is small, so are all the coefficients of T ; in fact, writing ej for the jth unit vector of R r , then
the ith coordinate of T ej is ij , so |ij | kT ej k kT k.

262I Lemma Let : D R s be a function, where D R r . For i s let i : D R be its ith


coordinate, so that (x) = (1 (x), . . . , s (x)) for x D.
(a) If is differentiable relative to its domain at x D, then is continuous at x.
(b) If x D, then is differentiable relative to its domain at x iff each i is differentiable relative to its
domain at x.
(c) If is differentiable at x D, then all the partial derivatives j of are defined at x, and the
i

derivative of at x is the matrix h


j (x)iis,jr .
i

(d) If all the partial derivatives


j , for i s and j r, are defined in a neighbourhood of x D and
i

are continuous at x, then is differentiable at x.


proof (a) Let T be a derivative of at x. Applying the definition 262Fb with = 1, we see that there is a
> 0 such that
k(y) (x) T (y x)k ky xk
whenever y D and ky xk . Now
262I Lipschitz and differentiable functions 299

k(y) (x)k kT (y x)k + ky xk (1 + kT k)ky xk


whenever y D and ky xk , so is continuous at x.
(b)(i) If is differentiable relative to its domain at x D, let T be a derivative of at x. For i s let
Ti be the 1 r matrix consisting of the ith row of T . Let > 0. Then we have a > 0 such that

|i (y) i (x) Ti (y x)| k(y) (x) T (y x)k


ky xk
whenever y D and ky xk , so that Ti is a derivative of i at x.
(ii) If each i is differentiable relative to its domain at x, with corresponding derivatives Ti , let T be
the s r matrix with rows T1 , . . . , Ts . Given > 0, there is for each i s a i > 0 such that
|i (y) i (x) Ti y| ky xk whenever y D, ky xk i ;
set = minis i > 0; then if y D and ky xk , we shall have
Ps
k(y) (x) T (y x)k2 = i=1 |i (y) i (x) Ti (y x)|2 s2 ky xk2 ,
so that

k(y) (x) T (y x)k sky xk.
As is arbitrary, T is a derivative of at x.
(c) Set T = 0 (x). We have
k(y)(x)T (yx)k
limyx = 0;
kyxk

fix j r, and consider y = x + ej , where ej = (0, . . . , 0, 1, 0, . . . , 0) is the jth unit vector in R r . Then we
must have
k(x+ej )(x)T (ej )k
lim0 = 0.
||

Looking at the ith coordinate of (x + ej ) (x) T (ej ), we have


|i (x + ej ) i (x) ij | k(x + ej ) (x) T (ej )k,
where ij is the (i, j)th coefficient of T ; so that
|i (x+ej )i (x)ij |
lim0 = 0.
||
i
But this just says that the partial derivative j (x) exists and is equal to ij , as claimed.
i
(d) Now suppose that the partial derivatives j are defined near x and continuous at x. Let > 0. Let
> 0 be such that
|
j (y) ij |
i

i
whenever ky xk , writing ij = j (x). Now suppose that ky xk . Set
y = (1 , . . . , r ), x = (1 , . . . , r ),

yj = (1 , . . . , j , j+1 , . . . , r ) for 0 j r,
so that y0 = x, yr = y and the line segment between yj1 and yj lies wholly within of x whenever
1 j r, since if z lies on this line segment then i lies between i and i for every i. By the ordinary
mean value theorem for differentiable real functions, applied to the function
t 7 i (1 , . . . , j1 , t, j+1 , . . . , r ),
there is for each i s, j r a point zij on the line segment between yj1 and yj such that
i (yj ) i (yj1 ) = (j j )
j (zij ).
i
300 Change of variable in the integral 262I

But
|
j (zij ) ij | ,
i

so
|i (yj ) i (yj1 ) ij (j j )| |j j | ky xk.
Summing over j,
Pr
|i (y) i (x) j=1 ij (j j )| rky xk
for each i. Summing the squares and taking the square root,

k(y) (x) T (y x)k r sky xk,
where T = hij iis,jr . And this is true whenever ky xk . As is arbitrary, 0 (x) = T is defined.


262J Remark I am not sure if I ought to apologize for the notation j . In such formulae as (j
j )
j (zij )
i
above, the two appearances of j clash most violently. But I do not think that any person of good
will is likely to be misled, provided that the labels j (or whatever symbols are used to represent the variables
involved) are adequately described when the domain of is first introduced (and always remembering that
in partial differentiation, we are not only moving one variable a j in the present context but holding
fixed some further list of variables, not listed in the notation). I believe that the traditional notation j
has survived for solid reasons, and I should like to offer a welcome to those who are more comfortable with
it than with any of the many alternatives which have been proposed, but have never taken root.

262K The Cantor function revisited It is salutary to re-examine the examples of 134H-134I in the
light of the present considerations. Let f : [0, 1] [0, 1] be the Cantor function (134H) and set g(x) =
1 1
2 (x + f (x)) for x [0, 1]. Then g : [0, 1] [0, 1] is a homeomorphism (134I); set = g : [0, 1] [0, 1].
1
We see that if 0 x y 1 then g(y) g(x) 2 (y x); equivalently, (y) (x) 2(y x) whenever
0 x y 1, so that is a Lipschitz function, therefore absolutely continuous (262Bc). If D = {x : 0 (x)
is defined}, then [0, 1] \ D is negligible (225Cb), so [0, 1] \ [D] = [ [0, 1] \ D] is negligible (262Da). I noted
in 134I that there is a measurable function h : [0, 1] R such that the composition h is not measurable;
now h(D) = (h)D cannot be measurable, even though D is differentiable.

262L It will be convenient to be able to call on the following straightforward result.


(DB(x,)) (x+z,D)
Lemma Suppose that D R r and x R r are such that lim0 = 1. Then limz0 = 0,
B(x,) kzk
where (x + z, D) = inf yD kx + z yk.
proof Let > 0. Let 0 > 0 be such that
r
(D B(x, )) > (1 ( ) )B(x, )
1+

whenever 0 < 0 . Take any z such that 0 < kzk 0 /(1 + ). ?? Suppose, if possible, that (x + z, D) >
kzk. Then B(x + z, kzk) B(x, (1 + )kzk) \ D, so

(D B(x, (1 + )kzk)) B(x, (1 + )kzk) B(x + z, kzk)


r
= (1 ( ) )B(x, (1 + )kzk),
1+

which is impossible, as (1 + )kzk 0 . X


X Thus (x + z, D) kzk. As is arbitrary, this proves the result.
Remark There is a word for this; see 261Yg.

262M I come now to the first result connecting Lipschitz functions with differentiable functions. I
approach it through a substantial lemma which will be the foundation of 263.
262M Lipschitz and differentiable functions 301

Lemma Let r, s 1 be integers and a function from a subset D of R r to R s which is differentiable at


each point of its domain. For each x D let T (x) be a derivative of . Let Msr be the set of s r matrices
and : A ]0, [ a strictly positive function, where A Msr is a non-empty set containing T (x) for every
x D. Then we can find sequences hDn inN , hTn inN such that
(i) hDn inN is a disjoint cover of D by sets which are relatively measurable in D, that is, are intersections
of D with measurable subsets of R r ;
(ii) Tn A for every n;
(iii) k(x) (y) Tn (x y)k (Tn )kx yk for every n N and x, y Dn ;
(iv) kT (x) Tn k (Tn ) for every x Dn .
proof (a) The first step is to note that there is a sequence hSn inN in A such that
S
A nN {T : T Msr , kT Sn k < (Sn )}.
P
P (Of course this is a standard result about separable metric spaces.) Write Q for the set of matrices in
Msr with rational coefficients; then there is a natural bijection between Q and Qsr , so Q and Q N are
countable. Enumerate Q N as h(Rn , kn )inN . For each n N, choose Sn A by the rule
if there is an S A such that {T : kT Rn k 2kn } {T : kT Sk < (S)}, take such an S for
Sn ;
otherwise, take Sn to be any member of A.
I claim that this works. For let S A. Then (S) > 0; take k N such that 2k < (S). Take R Q
such that kR Sk < min((S) 2k , 2k ); this is possible because kR Sk will be small whenever all the
coefficients of R are close enough to the corresponding coefficients of S (262Ha), and we can find rational
numbers to achieve this. Let n N be such that R = Rn and k = kn . Then
{T : kT Rn k 2kn } {T : kT Sk < (S)}
(because kT Sk kT Rn k + kRn Sk), so we must have chosen Sn by the first part of the rule above,
and
S {T : kT Rn k 2kn } {T : kT Sn k < (Sn )}.
As S is arbitrary, this proves the result. Q
Q
(b) Enumerate Qr Qr N as h(qn , qn0 , mn )inN . For each n N, set

Hn = {x : x [qn , qn0 ] D, k(y) (x) Smn (y x)k (Smn )ky xk


for every y [qn , qn0 ] D}
\
= [qn , qn0 ] D {x : x D,
0 ]D
y[qn ,qn

k(y) (x) Smn (y x)k (Smn )ky xk}.

Because is continuous, Hn = D H n , writing H n for the closure of Hn , so Hn is relatively measurable in


D. Note that if x, y Hn , then y D [qn , qn0 ], so that
k(y) (x) Smn (y x)k (Smn )ky xk.
Set
Hn0 = {x : x Hn , kT (x) Smn k (Smn )}.
S
(c) D = nN Hn0 . P P Let x D. Then T (x) A, so there is a k N such that kT (x) Sk k < (Sk ).
Let > 0 be such that
k(y) (x) T (x)(x y)k ((Sk ) kT (x) Sk k)kx yk
whenever y D and ky xk . Then

k(y) (x) Sk (x y)k ((Sk ) kT (x) Sk k)kx yk + kT (x) Sk kkx yk


(Sk )kx yk
302 Change of variable in the integral 262M

whenever y D B(x, ). Let q, q 0 Qr be such that x [q, q 0 ] B(x, ). Let n be such that q = qn ,
q 0 = qn0 and k = mn . Then x Hn0 . Q
Q
(d) Write
(Hn B(x,))
Cn = {x : x Hn , lim0 = 1}.
B(x,)

Then Cn Hn0 .
P
P (i) Take x Cn , and set T = T (x) Smn . I have to show that kT k (Smn ). Take > 0. Let 0 > 0
be such that
k(y) (x) T (x)(y x)k ky xk
whenever y D and ky xk 0 . Since
k(y) (x) Smn (y x)k (Smn )ky xk
whenever y Hn , we have
kT (y x)k ( + (Smn ))ky xk
whenever y Hn and ky xk 0 .
(ii) By 262L, there is a 1 > 0 such that (1+2)1 0 and (x+z, Hn ) kzk whenever 0 < kzk 1 .
So if kzk 1 there is a y Hn such that kx + z yk 2kzk. (If z = 0 we can take y = x.) Now
kx yk (1 + 2)kzk 0 , so

kT zk kT (y x)k + kT (x + z y)k
( + (Smn ))ky xk + kT kkx + z yk
( + (Smn ))kzk + ( + (Smn ) + kT k)kx + z yk
( + (Smn ) + 22 + 2(Smn ) + 2kT k)kzk.
And this is true whenever 0 < kzk 1 . But multiplying this inequality by suitable positive scalars we see
that

kT zk + (Smn ) + 22 + 2(Smn ) + 2kT k kzk
for all z R r , and
kT k + (Smn ) + 22 + 2(Smn ) + 2kT k.
As is arbitrary, kT k (Smn ), as claimed. Q
Q
(e) By 261Da, Hn \ Cn is negligible for every n, so Hn \ Hn0 is negligible, and
Hn0 = D (H n \ (Hn \ Hn0 ))
is relatively measurable in D. Set
S
Dn = Hn0 \ k<n Hk0 , Tn = Smn
for each n; these serve.

262N Corollary Let be a function from a subset D of Rr to R s , and suppose that is differentiable
relative to its domain at each point of D. Then D can be expressed as the union of a sequence hDn inN of
sets such that Dn is Lipschitz for each n N.
proof In 262M, take (T ) = 1 for every T A = Msr . If x, y Dn then

k(x) (y)k k(x) (y) Tn (x y)k + kTn (x y)k


kx yk + kTn kkx yk,
so Dn is (1 + kTn k)-Lipschitz.
*262Q Lipschitz and differentiable functions 303

262O Corollary Suppose that is an injective function from a measurable subset D of R r to R r , and
that is differentiable relative to its domain at every point of D.
(a) If A D is negligible, [A] is negligible.
(b) If E D is measurable, then [E] is measurable.
(c) If D is measurable and f is a measurable function defined on a subset of R r , then f 1 is measurable.
(d) If H R r and 1 is defined almost everywhere on H, and if f is a function defined almost everywhere
on Rr , then f 1 is defined almost everywhere on H.
proof Let hDn inN
S be a sequence of measurable sets with union D such that Dn is Lipschitz for each n.
Then [A D] = nN (Dn )[A Dn ] is negligible for every negligible A R r , by 262D.
Now parts (b)-(d) follow from (a) (because is continuous), just as in 262E.

262P Corollary Let be a function from a a subset D of R r to R s , and suppose that is differentiable
relative to its domain, with a derivative T (x), at each point x D. Then the function x 7 T (x) is
measurable in the sense that ij : D R is measurable for all i s and j r, where ij (x) is the (i, j)th
coefficient of the matrix T (x) for all i, j and x.
proof For each k N, apply 262M with (T ) = 2k for each T A = Msr , obtaining sequences hDkn inN
(kn)
of relatively measurable subsets of D and hTkn inN in Msr . Let ij be the (i, j)th coefficient of Tkn . Then
we have functions fijk : D R defined by setting
(kn)
fijk (x) = ij if x Dkn .
Because the Dkn are relatively measurable, the fijk are measurable functions. For x Dkn ,
|ij (x) fijk (x)| kT (x) Tn k 2k ,
so |ij (x) fijk (x)| 2k for every x D, and
ij = limk fijk
is measurable, as claimed.

*262Q This concludes the part of the section which is essential for the rest of the chapter. However
the main results of 263 will I think be better understood if you are aware of the fact that any Lipschitz
function is differentiable (relative to its domain) almost everywhere on its domain. I devote the next couple
of pages to a proof of this fact, which apart from its intrinsic interest is a useful exercise.
Rademachers theorem Let be a Lipschitz function from a subset of R r to R s , where s 1. Then is
differentiable relative to its domain almost everywhere on its domain.
proof (a) By 262Ba and 262Ib, it will be enough to deal with the case s = 1. By 262Bb, there is a Lipschitz
function : R r R extending ; now is differentiable with respect to its domain at any point of dom
at which is differentiable, so it will be enough if I can show that is differentiable almost everywhere. To
make the notation more agreeable to the eye, I will suppose that itself was defined everywhere on R r . Let
be a Lipschitz constant for .
The proof proceeds by induction on r. If r = 1, we have a Lipschitz function : R R; now is
absolutely continuous in any bounded interval (262Bc), therefore differentiable almost everywhere. Thus
the induction starts. The rest of the proof is devoted to the inductive step to r > 1.

(b) The first step is to show that all the partial derivatives j are defined almost everywhere and are
Borel measurable. PP Take j r. For q Q \ {0} set
1
q (x) = ((x + qej ) (x)),
q
writing ej for the jth unit vector of R r . Because is continuous, so is q , so that q is a Borel measurable
function for each q. Next, for any x Rr ,
1
D+ (x) = lim sup0 ((x + ej ) (x)) = limn supqQ,0<|q|2n q (x),

304 Change of variable in the integral *262Q

so that the set on which D+ (x) is defined in R is Borel and D+ is a Borel measurable function. Similarly,
1
D (x) = lim inf 0 ((x + ej ) (x))

is a Borel measurable function with Borel domain. So

E = {x : j (x) exists in R} = {x : D+ (x) = D (x) R}

is a Borel set, and j is a Borel measurable function.
On the other hand, if we identify R r with R J R, taking J to be {1, . . . , j 1, j + 1, . . . , r}, then we can
think of the measure on R r as being the product of Lebesgue measure J on R J with Lebesgue measure
1 on R (251M). Now for every y R J we have a function y : R R defined by writing
y () = (y, ),
and E becomes
{(y, ) : 0y () is defined},
so that all the sections
{ : (y, ) E}
are conegligible subsets of R, because every y is Lipschitz, therefore differentiable almost everywhere, as
remarked in part (a) of the proof. Since we know that E is measurable, it must be conegligible, by Fubinis

theorem (apply 252D or 252F to the complement of E). Thus j
is defined almost everywhere, as claimed.
Q
Q
Write

H = {x : x R r , j (x) exists for every j r},
so that H is a conegligible Borel set in R r .
(c) For the rest of this proof, I fix on the natural identification of R r with R r1 R, identifying (1 , . . . , r )

with ((1 , . . . , r1 ), r ). For x H, let T (x) be the 1 r matrix ( 1
(x), . . . , r
(x)).

(d) Set
|(x+(u,0))(x)T (x)(u,0)|
H1 = {x : x H, limu0 in R r1 = 0}.
kuk

I claim that H1 is conegligible in R r . P


P This is really the same idea as in (b). For x H, x H1 iff
for every > 0 there is a > 0 such that
|(x + (u, 0)) (x) T (x)(u, 0)| kuk
whenever kuk ,
that is, iff
for every m N there is an n N such that
|(x + (u, 0)) (x) T (x)(u, 0)| 2m kuk
whenever u Qr1 and kuk 2n .
But for any particular m N and u Qr1 the set
{x : |(x + (u, 0)) (x) T (x)(u, 0)| 2m kuk}
is measurable, indeed Borel, because all the functions x 7 (x + (u, 0)), x 7 (x), x 7 T (x)(u, 0) are Borel
measurable. So H1 is of the form
T S T
mN nN uQr1 ,kuk2n Emnu

where every Emnu is a measurable set, and H1 is therefore measurable.


Now however observe that for any R, the function
v 7 (v) = (v, ) : R r1 R
*262Q Lipschitz and differentiable functions 305

is Lipschitz, therefore (by the inductive hypothesis) differentiable almost everywhere on R r1 ; and that
(v, ) H1 iff (v, ) H and 0 (v) is defined. Consequently {v : (v, ) H1 } is conegligible whenever
{v : (v, ) H} is, that is, for almost every R; so that H1 , being measurable, must be conegligible. Q
Q
(e) Now, for q, q 0 Q and n N, set
(x+(0,))(x)
F (q, q 0 , n) = {x : x R r , q q 0 whenever 0 < || 2n }.

Set
(F (q,q 0 ,n)B(x,))
F (q, q 0 , n) = {x : x F (q, q 0 , n), lim0 = 1}.
B(x,)

By 261Da, F (q, q 0 , n) \ F (q, q 0 , n) is negligible for all q, q 0 , n, so that


S
H2 = H1 \ q,q0 Q,nN (F (q, q 0 , n) \ F (q, q 0 , n))
is conegligible.

(f ) I claim that is differentiable at every point of H2 . P
P Take x = (u, ) H2 . Then = (x) and
r
T = T (x) are defined. Let be a Lipschitz constant for .
Take > 0; take q, q 0 Q such that q < < q 0 + . There must be an n N
such that x F (q, q 0 , n); consequently x F (q, q 0 , n), by the definition of H2 . By 262L, there is a
0 > 0 such that (x + z, F (q, q 0 , n)) kzk whenever kzk 0 . Next, there is a 1 > 0 such that
|(x + (v, 0)) (x) T (v, 0)| kvk whenever v R r1 and kvk 1 . Set
= min(0 , 1 , 2n )/(1 + 2) > 0.
Suppose that z = (v, ) R r and that kzk . Because kzk 0 there is an x0 = (u0 , 0 ) F (q, q 0 , n)
such that kx + z x0 k 2kzk; set x = (u0 , ). Now
max(ku u0 k, | 0 |) kx x0 k (1 + 2)kzk min(1 , 2n ).
so
|(x ) (x) T (x x)| ku0 uk (1 + 2)kzk.
But also
|(x0 ) (x ) T (x0 x )| = |(x0 ) (x ) ( 0 )| | 0 | (1 + 2)kzk,
because x0 F (q, q 0 , n) and | 0 | 2n , so that (if x0 6= x )
(x )(x0 )
q q0 +
0
and
(x0 )(x )
.

0

Finally,
|(x + z) (x0 )| kx + z x0 k 2kzk,

|T z T (x0 x)| kT kkx + z x0 k 2kT kkzk.


Putting all these together,

|(x + z) x T z| |(x + z) (x0 )| + |T (x0 x) T z|


+ |(x0 ) (x ) T (x0 x )| + |(x ) (x) T (x x)|
2kzk + 2kT kkzk + (1 + 2)kzk + (1 + 2)kzk
= (2 + 2kT k + 2 + 4)kzk.

And this is true whenever kzk . As is arbitrary, is differentiable at x. Q


Q
Thus {x : is differentiable at x} includes H2 and is conegligible; and the induction continues.
306 Change of variable in the integral 262X

262X Basic exercises (a) Let and be Lipschitz functions from subsets of R r to R s . Show that
+ is a Lipschitz function from dom dom to Rs .

(b) Let be a Lipschitz function from a subset of R r to R s , and c R. Show that c is a Lipschitz
function.

(c) Suppose : D R s and : E R q are Lipschitz functions, where D R r and E R s . Show that
the composition : D 1 [E] R q is Lipschitz.

(d) Suppose , are functions from subsets of R r to R s , and suppose that x dom dom is such
that each function is differentiable relative to its domain at x, with derivatives S, T there. Show that +
is differentiable relative to its domain at x, and that S + T is a derivative of + at x.

(e) Suppose that is a function from a subset of R r to R s , and is differentiable relative to its domain at
x dom . Show that c is differentiable relative to its domain at x for every c R.

> (f ) Suppose : D R s and : E Rq are functions, where D R r and E R s ; suppose that is


differentiable relative to its domain at x D 1 [E], with an s r matrix T a derivative there, and that
is differentiable relative to its domain at (x), with a q s matrix S a derivative there. Show that the
composition is differentiable relative to its domain at x, and that the q r matrix ST is a derivative of
at x.

(g) Let : R r R s be a linear operator, with associated matrix T . Show that is differentiable
everywhere, with 0 (x) = T for every x.

> (h) Let G R r be a convex open set, and : G R s a function such that all the partial derivatives
i
j are defined everywhere in G. Show that is Lipschitz iff all the partial derivatives are bounded on G.

(i) Let : R r R s be a function. Show that is differentiable at x R r iff for every m N there are
an n N and an r s matrix T with rational coefficients such that k(y) (x) T (y x)k 2m ky xk
whenever ky xk 2n .

>(j) Suppose that f is a real-valued function which is integrable over R r , and that g : R r R is a
g
bounded differentiable function such that the partial derivative is bounded, where j r. Let f g be
j
g
the convolution of f and g (255L). Show that (f g) is defined everywhere and equal to f . (Hint:
j j
255Xg.)

>(k) Let (X, , ) be a measure space, G R r an open set, and f : X G R a function. Suppose
that
(i) for every x X, t 7 f (x, t) : G R is differentiable;
f
(ii) there is an integrable function g on X such that | j
(x, t)| g(x) whenever x X, t G
and j R r;
(iii) |f (x, t)|(dx) exists in R for every t G.
R
Show that t 7 f (x, t)(dx) : G R is differentiable. (Hint: show first that, for a suitable M , |f (x, t)
f (x, t0 )| M |g(x)|kt t0 k for every t, t0 G and x X.)

262Y Further exercises (a) Show qP that if T = hij iis,jr is an s r matrix then the operator norm
s Pr 2
kT k, as defined in 262H, is at most i=1 j=1 |ij | .


(b) Give an example of a measurable function : R 2 R such that dom is not measurable.
1

(c) Let : D R be any function, where D R r . Show that H = {x : x D, is differentiable relative



to its domain at x} is relatively measurable in D, and that H is measurable for every j r.
j
262 Notes Lipschitz and differentiable functions 307

...
(d) A function : R r R is smooth if all its partial derivatives are defined everywhere in
i j ...l
r r r
R and are continuous. Show that if f is integrable over R and : R R is smooth and has bounded
support then the convolution f is smooth. (Hint: 262Xj, 262Xk.)
2 2 R
(e) For > 0 set (x) = e1/( kxk ) if kxk < , 0 if kxk ; set = (x)dx, (x) = 1 (x) for
r
every x. (i) Show that
R : R R is smooth and has bounded support. (ii) Show that if f is integrable
r
over R then lim0 |f (x) (f )(x)|dx = 0. (Hint: start with continuous functions f with bounded
support, and use 242O.)
r
(f ) Show
R that if f is integrable over R and > 0 there is a smooth function h with bounded support
such that |f h| . (Hint: either reduce to the case in which f has bounded support and use 262Ye or
adapt the method of 242Xi.)

(g) Suppose that f is a real function which is integrable over every bounded subset of R r . (i) Show that
r
Rf is integrable whenever : R R is a smooth function with bounded support. (ii) Show R that if
f = 0 for every smooth function with bounded support then f =a.e. 0. (Hint: show that B(x,) f = 0
R
for every x R r and > 0, and use 261C. Alternatively show that E f = 0 first for E = [b, c], then for
open sets E, then for arbitrary measurable sets E.)

(h) Let f be integrable over R r , and for > 0 let : R r R be the function of 262Ye. Show that
lim0 (f )(x) = f (x) for every x in the Lebesgue set of f . (Hint: 261Ye.)

(i) Let L be the space of all Lipschitz functions from R r to R s and for L set
kk = k(0)k + inf{ : [0, [, k(y) (x)k ky xk for every x, y R r }.
Show that (L, k k) is a Banach space.

262 Notes and comments The emphasis of this section has turned out to be on the connexions between
the concepts of Lipschitz function and differentiable function. It is the delight of classical real analysis that
such intimate relationships arise between concepts which belong to different categories. Lipschitz functions
clearly belong to the theory of metric spaces (I will return to this in 264), while differentiable functions
belong to the theory of differentiable manifolds, which is outside the scope of this volume. I have written
this section out carefully just in case there are readers who have so far missed the theory of differentiable
mappings between multi-dimensional Euclidean spaces; but it also gives me a chance to work through the
notion of function differentiable relative to its domain, which will make it possible in the next section to
ride smoothly past a variety of problems arising at boundaries. The difficulties I am concerned with arise
in the first place with such functions as the polar-coordinate transformation
(, ) 7 ( cos , sin ) : {(0, 0)} (]0, [ ], ]) R 2 .
In order to make this a bijection we have to do something rather arbitrary, and the domain of the transfor-
mation cannot be an open set. On the definitions I am using, this function is differentiable relative to its
domain at every point of its domain, and we can apply such results as 262O uninhibitedly. You will observe
that in this case the non-interior points of the domain form a negligible set {(0, 0)} (]0, [ {}), so we
can expect to be able to ignore them; and for most of the geometrically straightforward transformations
that the theory is applied to, judicious excision of negligible sets will reduce problems to the case of honestly
differentiable functions with open domains. But while open-domain theory will deal with a large propor-
tion of the most important examples, there is a danger that you would be left with real misapprehensions
concerning the scope of these methods.
The essence of differentiability is that a differentiable function is approximable, near any given point
of its domain, by an affine function. The idea of 262M is to describe a widely effective method of dissecting
D = dom into countably many pieces on each of which is well-behaved. This will be applied in 263 and
265 to investigate the measure of [D]; but we already have several straightforward consequences (262N-
262P).
308 Change of variable in the integral 263 intro.

263 Differentiable transformations in R r


This section is devoted to the proof of a single major theorem (263D) concerning differentiable transfor-
mations between subsets of R r . There will be a generalization of this result in 265, and those with some
familiarity with the topic, or sufficient hardihood, may wish to read 264 before taking this section and 265
together. I end with a few simple corollaries and an extension of the main result which can be made in the
one-dimensional case (263I).
Throughout this section, as in the rest of the chapter, will denote Lebesgue measure on R r .

263A Linear transformations I begin with the special case of linear operators, which is not only the
basis of the proof of 263D, but is also one of its most important applications, and is indeed sufficient for
many very striking results.
Theorem Let T be a real r r matrix; regard T as a linear operator from R r to itself. Let J = | det T | be
the modulus of its determinant. Then
T [E] = JE
r
for every measurable set E R . If T is a bijection (that is, if J 6= 0), then
F = JT 1 [F ]
for every measurable F R r , and
R R
F
g d = J T 1 [F ]
gT d
for every integrable function g and measurable set F .
proof (a) The first step is to show that T [I] is measurable for every half-open interval I R r . P P Any
non-empty half-open interval I = [a, b[ is a countable union of closed intervals In = [a, b 2n 1], and each
In is compact (2A2F),
S so that T [In ] is compact (2A2Eb), therefore closed (2A2Ec), therefore measurable
(115G), and T [I] = nN T [In ] is measurable. QQ
(b) Set J = T [ [0, 1[ ], where 0 = (0, . . . , 0) and 1 = (1, . . . , 1); because T [ [0, 1[ ] is bounded, J < .
(I will eventually show that J = J.) It is convenient to deal with the case of singular T first. Recall that
T , regarded as a linear transformation from R r to itself, is either bijective or onto a proper linear subspace.
In the latter case, take any e Rr \ T [R r ]; then the sets
T [ [0, 1[ ] + e,
as runs over [0, 1], are disjoint and all of the same measure J , because is translation-invariant (134A);
moreover, their union is bounded, so has finite outer measure. As there are infinitely many such , the
common measure J must be zero. Now observe that
S
T [R r ] = zZr T [ [0, 1[ ] + T z,
and
(T [ [0, 1[ ] + T z) = J = 0
for every z Zr , while Zr is countable, so T [Rr ] = 0. At the same time, because T is singular, it has zero
determinant, and J = 0. Accordingly
T [E] = 0 = JE
r
for every measurable E R , and were done.
(c) Henceforth, therefore, let us assume that T is non-singular. Note that it and its inverse are continuous,
so that T is a homeomorphism, and T [G] is open iff G is open.
If a R r and k N, then

T [ a, a + 2k 1 ] = 2kr J .

PP Set Jk = T [ 0, 2k 1 ]. Now T [ a, a + 2k 1 ] = T [ 0, 2k 1 ] + T a; because is translation-invariant,

its measure is also Jk . Next, [0, 1[ is expressible as a disjoint uion of 2kr sets of the form a, a + 2k 1 ;
consequently, T [ [0, 1[ ] is expressible as a disjoint uion of 2kr sets of the form T [ a, a + 2k 1 ], and
263A Differentiable transformations in R r 309

J = T [ [0, 1[ ] = 2kr Jk ,
that is, Jk = 2kr J , as claimed. Q
Q
(d) Consequently T [G] = J G for every open set G R r . P P For each k N, set
k
Qk = {z : z Z , 2 z, 2 z + 2k 1 G,
r k

S
Gk = zQk 2k z, 2k z + 2k 1 .
k k k
kr
Then Gk is a disjoint union of #(Qk ) sets of the form k 2 kz, 2 zk+ 2 1 , so Gk = 2kr#(Q k ); also,
T [Gk ] is a disjoint union of #(Qk ) sets of the form T [ 2 z, 2 z + 2 1 ], so has measure 2 J #(Qk ) =
J Gk , using (c).
Observe next that hGk ikN is a non-decreasing sequence with union G, so that
T [G] = limk T [Gk ] = limk J Gk = J G. Q
Q

(e) It follows that T [A] = J A for every A Rr . P


P Given A R r and > 0, there are open sets
G, H such that G A, H T [A], G A + and H T [A] + (134Fa). Set G1 = G T 1 [H];

then G1 is open because T 1 [H] is. Now T [G1 ] = J G1 , so

T [A] T [G1 ] = J G1 J A + J
J G1 + J = T [G1 ] + J H + J
T [A] + + J .
As is arbitrary, T [A] = J A. Q
Q
(f ) Consequently T [E] exists and is equal to J E for every measurable E R r . P
P Let E R r be
r 0 1
measurable, and take any A R . Set A = T [A]. Then

(A T [E]) + (A \ T [E]) = (T [A0 E]) + (T [A0 \ E])


= J ( (A0 E) + (A0 \ E))
= J A0 = T [A0 ] = A.
As A is arbitrary, T [E] is measurable, and now
T [E] = T [E] = J E = J E. Q
Q

(g) We are at last ready for the calculation of J . Recall that the matrix T must be expressible as P DQ,
where P and Q are orthogonal matrices and D is diagonal, with non-negative diagonal entries (2A6C). Now
we must have
T [ [0, 1[ ] = P [D[Q[ [0, 1[ ]]],
so, using (f),
J = JP JD

JQ ,
where JP = P [ [0, 1[ ], etc. Now we find that JP = JQ
= 1. PP Let B = B(0, 1) be the unit ball of
r
R . Because B is closed, it is measurable;
because it is bounded, B < ; and because B includes the
non-empty half-open interval 0, r1/2 1 , B > 0. Now P [B] = Q[B] = B, because P and Q are orthogonal
matrices; so we have
B = P [B] = JP B,
and JP must be 1; similarly, JQ

= 1. Q
Q

(h) So we have only to calculate JD . Suppose the coefficients of D are 1 , . . . , r 0, so that Dx =
(1 1 , . . . , r r ) = d x. We have been assuming since the beginning of (c) that T is non-singular, so no i
can be 0. Accordingly
D[ [0, 1[ ] = [0, d[,
310 Change of variable in the integral 263A

and

Qr
JD = [0, d[ = i=1 i = det D.
Now because P and Q are orthogonal, both have determinant 1, so det T = det D and J = det T ;
because J is surely non-negative, J = | det T | = J.
(i) Thus T [E] = JE for every Lebesgue measurable E R r . If T is non-singular, then we may use
the above argument to show that T 1 [F ] is measurable for every measurable F , and
R
F = T [T 1 [F ]] = JT 1 [F ] = J (T 1 [F ]) d,
identifying J with the constant function with value J. By 235A,
R R R
F
g d = T 1 [F ] JgT d = J T 1 [F ] gT d
for every integrable function g and measurable set F .

263B Remark Perhaps I should have warned you that I should be calling on the results of 235. But
if they were fresh in your mind the formulae of the statement of the theorem will have recalled them, and if
not then it is perhaps better to turn back to them now rather than before reading the theorem, since they
are used only in the last sentence of the proof.
I have taken the argument above at a leisurely, not to say pedestrian, pace. The point is that while
the translation-invariance of Lebesgue measure, and its behaviour under simple magnification of a single
coordinate, are more or less built into the definition, its behaviour under general rotations is not, since a
rotation takes half-open intervals into skew cuboids. Of course the calculation of the measure of such an
object is not really anything to do with the Lebesgue theory, and it will be clear that much of the argument
would apply equally to any geometrically reasonable notion of r-dimensional volume.
We come now to the central result of the chapter. We have already done some of the detail work in 262M.
The next basic element is the following lemma.

263C Lemma Let T be any r r matrix; set J = | det T |. Then for any > 0 there is a = (T, ) > 0
such that
(i) | det S det T | whenever S is an r r matrix and kS T k ;
(ii) whenever D R r is a bounded set and : D R r is a function such that k(x) (y) T (x y)k
kx yk for all x, y D, then | [D] J D| D.
proof (a) Of course (i) is the easy part. Because det S is a continuous function of the coefficients of S, and
the coefficients of S must be close to those of T if kS T k is small (262Hb), there is surely a 0 > 0 such
that | det S det T | whenever kS T k 0 .
(b)(i) Write B = B(0, 1) for the unit ball of R r , and consider T [B]. We know that T [B] = JB (263A).
Let G T [B] be an open set such that G (J + )B (134Fa). Because B is compact (2A2F) so is T [B],
so there is a 1 > 0 such that T [B] + 1 B G (2A2Ed). This means that (T [B] + 1 B) (J + )B.
(ii) Now suppose that D R r is a bounded set, and that : D R r is a function such that
k(x) (y) T (x y)k 1 kx yk for all x, y D. Then if x D and > 0,
[D B(x, )] (x) + T [B] + 1 B,
because if y D B(x, ) then T (y x) T [B] and

(y) = (x) + T (y x) + ((y) (x) T (y x))


(x) + T [B] + 1 ky xkB (x) + T [B] + 1 B.
Accordingly

[D B(x, )] (T [B] + 1 B) = r (T [B] + 1 B)


r (J + )B = (J + )B(x, ).
S P
Let > 0. Then there is a sequence hBn inN of balls in R r such that D nN Bn , n=0 Bn D+
and the sum of the measures of those Bn whose centres do not lie in D is at most (261F). Let K be the
263C Differentiable transformations in R r 311

set of those n such that the centre of Bn lies in D. Then [D Bn ] (J + )Bn for every n K. Also,
of course, is (kT k + 1 )-Lipschitz, so [D Bn ] (kT k + 1 )r Bn for n N \ K (262D). Now

X

[D] [D Bn ]
n=0
X X
(J + )Bn + (kT k + 1 )r Bn
nK nN\K

(J + )( D + ) + (kT k + 1 )r .

As is arbitrary,
[D] (J + ) D.

(c) If J = 0, we can stop here, setting = min(0 , 1 ); for then we surely have | det S det T |
whenever kS T k , while if : D Rr is such that k(x) (y) T (x y)k kx yk for all x,
y D, then
| [D] J D| = [D] D.
If J 6= 0, we have more to do. Because T has non-zero determinant, it has an inverse T 1 , and | det T 1 | =
J 1 . As in (b-i) above, there is a 2 > 0 such that (T 1 [B] + 2 B) (J 1 + 0 )B, where 0 = /J(J + ).
Repeating (b), we see that if C Rr is bounded and : C R r is such that k(u) (v) T 1 (u v)k
2 ku vk for all u, v C, then [C] (J 1 + 0 ) C.
Now suppose that D R r is bounded and : D R r is such that k(x) (y) T (x y)k 20 kx yk
for all x, y D, where 20 = min(2 , kT 1 k)/2kT 1 k2 > 0. Then
1
kT 1 ((x) (y)) (x y)k kT 1 k20 kx yk kx yk
2

for all x, y D, so must be injective; set C = [D] and = 1 : C D. Note that C is bounded,
because
k(x) (y)k (kT k + 20 )kx yk
whenever x, y D. Also
1
kT 1 (u v) ((u) (v))k kT 1 k20 k(u) (v)k k(u) (v)k
2
for all u, v C. But this means that
1
k(u) (v)k kT 1 kku vk k(u) (v)k
2

and k(u) (v)k 2kT 1 kku vk for all u, v C, so that


k(u) (v) T 1 (u v)k 220 kT 1 k2 ku vk 2 ku vk
for all u, v C.
By (b) just above, it follows that
D = [C] (J 1 + 0 ) C = (J 1 + 0 ) [D],
and
J D (1 + J0 ) [D].

(d) So if we set = min(0 , 1 , 20 ) > 0, and if D R r , : D R r are such that D is bounded and
k(x) (y) T (x y)k kx yk for all x, y D, we shall have
[D] (J + ) D,

[D] J D J0 [D] J D J0 (J + ) D = J D D,
so we get the required formula
312 Change of variable in the integral 263C

| [D] J D| D.

263D We are ready for the theorem.


Theorem Let D R r be any set, and : D R r a function differentiable relative to its domain at each
point of D. For each x D let T (x) be a derivative of relative to D at x, and set J(x) = | det T (x)|. Then
(i) J : D [0,R[ is a measurable function,
(ii) [D] D J d,
allowing as the value of the integral. If D is measurable, then
(iii) [D] is measurable.
If D is measurableRand is injective, then
(iv) [D] = D J d,
(v) for every real-valued function g defined on a subset of [D],
R R
[D]
g d = D J g d
if either integral is defined in [, ], provided we interpret J(x)g((x)) as zero when J(x) = 0 and g((x))
is undefined.
proof (a) To see that J is measurable, use 262P; the function T 7 | det T | is a continuous function of the
coefficients of T , and the coefficients of T (x) are measurable functions of x, by 262P, so x 7 | det T (x)| is
measurable (121K). We also know that if D is measurable, [D] will be measurable, by 262Ob. Thus (i)
and (iii) are done.
(b) For the moment, assume that D is bounded, and fix > 0. For r r matrices T , take (T, ) > 0
as in 263C. Take hDn inN , hTn inN as in 262M, so that hDn inN is a disjoint cover of D by sets which are
relatively measurable in D, and each Tn is an r r matrix such that
kT (x) Tn k (Tn , ) whenever x Dn ,

k(x) (y) Tn (x y)k (Tn , )kx yk for all x, y Dn .


Then, setting Jn = | det Tn |, we have
|J(x) Jn | for every x Dn ,

| [Dn ] Jn Dn | Dn ,
by the choice of (Tn , ). So we have
R P R
D
J d n=0 Jn Dn + D D J d + 2 D;
I am using here the fact that all the Dn are relatively measurable in D, so that, in particular, D =
P
n=0 Dn . Next,
P P
[D] n=0 [Dn ] n=0 Jn Dn + D.
Putting these together,
R
[D] D
J d + 2 D.
If D is measurable and is injective, then all the Dn are measurable subsets of R r , so all the [Dn ] are
measurable, and they are also disjoint. Accordingly
R P P
D
J d n=0 Jn Dn + D n=0 ([Dn ] + Dn ) + D = [D] + 2D.
Since is arbitrary, we get
R
[D] D
J d,
and if D is measurable and is injective,
R
D
J d [D];
thus we have (ii) and (iv), on the assumption that D is bounded.
*263F Differentiable transformations in R r 313

(c) For a general set D, set Bk = B(0, k); then


R R
[D] = limk [D Bk ] limk DBk
J d = D
J d,
with equality if is injective and D is measurable.
(d) For part (v), I seek to show that the hypotheses of 235L are satisfied, taking X = D and Y = [D].
P
P Set G = {x : x D, J(x) > 0}.
) If F [D] is measurable, then there are Borel sets F1 , F2 such that F1 F F2 and (F2 \F1 ) =
(
0. Set Ej = 1 [Fj ] for each j, so that E1 1 [F ] E2 , and both the sets Ej are measurable, because
and dom are measurable. Now, applying (iv) to Ej ,
R
Ej
J d = [Ej ] = (Fj [D]) = F
R
for both j, so E2 \E1 J d = 0 and J = 0 a.e. on E2 \ E1 . Accordingly J (1 [F ]) =a.e. J E1 , and
R R
J (1 [F ])d exists and is equal to E1 J d = F . At the same time, (1 [F ] G)4(E1 G) is
negligible, so 1 [F ] G is measurable.
R
) If F [D] and G 1 [F ] is measurable, then we know that [D \ G] = D\G J = 0 (by (iv)),
(
so F \ [G] must be negligible; while F [G] = [G 1 [F ]] is also measurable, by (iii). Accordingly F
is measurable whenever G 1 [F ] is measurable.
Thus all the hypotheses of 235L are satisfied. Q
Q Now (v) can be read off from the conclusion of 235L.

263E Remarks (a) This is a version of the classical result on change of variable in a many-dimensional
integral. What I here call J(x) is the Jacobian of at x; it describes the change in volumes of objects
near x, following the rule already established in 263A for functions with constant derivative. The idea of
the proof is also the classical one: to break the set D up into small enough pieces Dm for us to be able
to approximate by affine operators y 7 (x) + Tm (y x) on each. The potential irregularity of the set
D, which in this theorem may be any set, is compensated for by a corresponding freedom in choosing the
sets Dm . In fact there is a further decomposition of the sets Dm hidden in part (b-ii) of the proof of 263C;
each Dm is essentially covered by a disjoint family of balls, the measures of whose images we can estimate
with an adequate accuracy. There is always a danger of a negligible exceptional set, and we need the crude
inequalities of the proof of 262D to deal with it.

(b) Throughout the work of this chapter, from 261B to 263D, I have chosen balls B(x, ) as the basic
shapes to work with. I think it should be clear that in fact any reasonable shapes would do just as well. In
particular, the balls
Pr
B1 (x, ) = {y : i=1 |i i | }, B (x, ) = {y : |i i | i}
would serve perfectly. There are many alternatives. We could use sets of the form C(x, k), for x R r and
k k r
k N, defined to be the half-open cube of the form 2 z, 2 (z + 1) with z Z containing x, instead; or
even C 0 (x, ) = [x, x + 1[. In all such cases we have versions of the density theorems (261Yb-261Yc) which
support the remaining theory.

(c) I have presented 263D as a theorem about differentiable functions, because that is the normal form
in which one uses it in elementary applications. However, the proof depends essentially on the fact that a
differentiable function is a countable union of Lipschitz functions, and 263D would follow at once from the
same theorem proved for Lipschitz functions only. Now the fact is that the theorem applies to any countable
union of Lipschitz functions, because a Lipschitz function is differentiable almost everywhere. For more
advanced work (see Federer 69 or Evans & Gariepy 92, or Chapter 47 in Volume 4) it seems clear that
Lipschitz functions are the vital ones, so I spell out the result.

*263F Corollary Let D R r be any set and : D R r a Lipschitz function. Let D1 be the set of
points at which has a derivative relative to D, and for each x D1 let T (x) be such a derivative, with
J(x) = | det T (x)|. Then
(i) D \ D1 is negligible;
314 Change of variable in the integral *263F

(ii) J : D1 [0,
R [ is measurable;
(iii) [D] D J(x)dx.
If D is measurable, then
(iv) [D] is measurable.
If D is measurableR and is injective, then
(v) [D] = D J d,
(vi) for every real-valued function g defined on a subset of [D],
R R
[D]
g d = D J g d
if either integral is defined in [, ], provided we interpret J(x)g((x)) as zero when J(x) = 0 and g((x))
is undefined.
proof This is now just a matter of putting 262Q and 263D together, with a little help from 262D. Use
262Q to show that D \ D1 is negligible, 262D to show that [D \ D1 ] is negligible, and apply 263D to D1 .

263G Polar coordinates in the plane I offer an elementary example with a useful consequence.

2 2 2 0 cos sin
Define : R R by setting (, ) = ( cos , sin ) for , R . Then (, ) = , so
sin cos
J(, ) = || for all , . Of course is not injective, but if we restrict it to the domain D = {(0, 0)}{(, ) :
> 0, < } then D is a bijection between D and R 2 , and
R R
g d1 d2 = D g((, )) dd
for every real-valued function g which is integrable over R 2 .
Suppose, in particular, that we set
2 2 2
g(x) = ekxk /2
= e1 /2 e2 /2
for x = (1 , 2 ) R. Then
R R
2 R 2
g(x)dx = e1 /2 d1 e2 /2 d2 ,
R 2 R
as in 253D. Setting I = et /2 dt, we have g = I 2 . (To see that I is well-defined in R, note that the
integrand is continuous, therefore measurable, and that
R 1 t2 /2
1
e dt 2,

R 1 2 R 2 R Ra 1

et /2
dt = 1
et /2
dt 1
et/2 dt = lima 1
et/2 dt = e1/2
2

are both finite.) Now looking at the alternative expression we have

Z Z
I2 = g(x)dx =
g( cos , sin ) d(, )
D
Z Z Z
2 2
= e /2 d(, ) = e /2 dd
D 0
(ignoring the point (0, 0), which has zero measure)
Z Z a
2 /2 2
= 2e d = 2 lim e /2
d
0 a 0
2
= 2 lim (ea /2
+ 1) = 2.
a

Consequently
R 2

et /2
dt = I = 2,
which is one of the many facts every mathematician should know, and in particular is vital for Chapter 27
below.
263I Differentiable transformations in R r 315

263H Corollary If k N is odd,


R 2


xk ex /2
dx = 0;
if k = 2l N is even, then
R 2 (2l)!

xk ex /2
dx = 2.
2l l!

2
proof (a) To see that all the integrals are well-defined and finite, observe that limx xk ex /4
= 0, so
2
that Mk = supxR |xk ex /4 | is finite, and
R k x2 /2 R 2


|x e |dx Mk ex /4 dx < .

(b) If k is odd, then substituting y = x we get


R k x2 /2 R 2


x e dx = y k ey /2 dy,
so that both integrals must be zero.
R 2
(c) For even k, proceed by induction. Set Il = x2l ex /2 dx. I0 = 2 = 20! 0 0! 2 by 263G. For the
inductive step to l + 1 1, integrate by parts to see that
R a 2l+1 2 2 2 Ra 2

a
x xex /2 dx = a2l+1 ea /2 + (a)2l+1 ea /2 + a (2l + 1)x2l ex /2 dx
for every a 0. Letting a ,
Il+1 = (2l + 1)Il .
Because
(2(l+1))! (2l)!
2 = (2l + 1) 2,
2l+1 (l+1)! 2l l!
the induction proceeds.

263I The one-dimensional case The restriction to injective functions in 263D(v) is unavoidable in
the context of the result there. But in the substitutions of elementary calculus it is not always essential. In
the hope of clarifying the position I give a result here which covers many of the standard tricks.
Theorem Let I R be an interval with more than one point, and : I R a function which is absolutely
continuous on any closed bounded subinterval of I. Write u = inf I, u0 = sup I in [, ], and suppose that
v = limxu (x) and v 0 = limxu0 (x) are defined in [, ]. Let g be a Lebesgue measurable real-valued
function defined almost everywhere on [I]. Then
R v0 R
v
g = I g((x))0 (x)dx
R v0 Rv
whenever the right-hand side is defined in R, on the understanding that we interpret v g as v0 g when
v 0 < v, and g((x))0 (x) as 0 when 0 (x) = 0 and g((x)) is undefined.
proof (a) Recall that is differentiable almost everywhere on I (225Cb) and that [A] is negligible for
every negligible A I (225G). (These results are stated for closed bounded intervals; but since any interval
is expressible as the union of a sequence of closed bounded intervals, they remain valid in the present
context.) Set D = dom 0 , so that I \ D and [I \ D] are negligible. Next, setting R D0 = {x : x
0 0
D,
R (x) = 0}, D and D 0 are Borel sets (225J) and [D 0 ] is negligible, by 263D(ii), while I
g((x)) (x)dx =
0
D\D0
g((x)) (x)dx.
Applying 262M with A = R \ {0} and () = 12 || for A, we have sequences hEn inN , hn inN such
that hEn inN is a disjoint cover of D \ D0 by measurable sets, every n is non-zero, and |(x) (y)
n (x y)| 21 |n ||x y| for all x, y En ; so that, in particular, En is injective, while sgn 0 (x) = sgn n
for every x En , writing sgn = /|| as usual. Set n = sgn n for each n. Now 263D(v) tells us that
P R P R
n=0 |g| ([En ]) = n=0 En |g((x))0 (x)|dx
is finite.
316 Change of variable in the integral 263I

1
Note that
R 263D(v) also 0shows that if B R is negligible, then En [B] must be negligible for every
n, so that 1 [B] g((x)) (x)dx = 0.
Consequently, setting
P
C0 = {y : y ([I] dom g) \ ({v, v 0 } [I \ D] [D0 ]), n=0 |g(y)([En ])(y)| < },
[I] \ C0 is negligible, and if we set C = {y : y C0 , g(y) 6= 0},
R R
JC
g= Jg
for every J [I].
(b) The point of the argument is the following fact: if y C then

X
n ([En ])(y) = 1 if v < y < v 0 ,
n=0
= 1 if v 0 < y < v,
= 0 if y < v v 0 or v 0 v < y.
P
PP Because g(y) 6= 0 and n=0 |g(y)([En ])(y)| S is finite, {n : y [En ]} is finite; because y
/ [I \ D]
[D0 ], and En is injective for every n, and nN En = D \ D0 , K = 1 [{y}] is finite. For each x K,
P P
let nx be such that x Enx ; then nx = sgn 0 (x). So n=0 n ([En ])(y) = xK sgn 0 (x).
If J R\K is an interval, (z) 6= y for z J; since is continuous, the Intermediate Value Theorem tells
us that sgn((z)P y) is constant on J. A simple induction on #(K ], z[) shows that P sgn((z) y) =
sgn(v y) + 2 xK,x<z sgn 0 (x) for every z R \ K; taking the limit as z u0 , 0
xK sgn (x) =
1 0
2 (sgn(v y) sgn(v y)). (Here we may have to interpret sgn() as 1 in the obvious way.) This turns
out to be just what we need to know. Q Q
(c) What this means is that

Z v0 Z v0 Z
X
g= g C = g n ([En ])
v v C n=0
(allowing for the conventional reversal if v 0 < v)
X Z
X Z
= n g ([En ]) = n g((x))|0 (x)|dx
n=0 C n=0 En 1 [C]

(263D(v) again, applied to (En 1 [C]) for each n)



X Z
= n g((x))|0 (x)|dx
n=0 En

(because g((x)) (x) = 0 almost everywhere on En \ 1 [C])


0

X Z Z Z
= g((x))0 (x)dx = g((x))0 (x)dx = g((x))0 (x)dx,
n=0 En D\D0 I

as claimed.

263X Basic exercises (a) Let (X, , ) be any measure space, f L0 () and p [1, [. Show that
f Lp () iff
R
= p 0 p1 {x : x dom f, |f (x)| > }d
R R
is finite, and in this case kf kp = 1/p . (Hint: |f |p = 0 {x : |f (x)|p > }d, by 252O; now substitute
= p .)
r
P(b)

Let f be an integrable function defined almost everywhere on
r
PR

. Show

R that if < r 1 then
n=1 n |f (nx)| is finite for almost every x R . (Hint: estimate n=0 n B
|f (nx)|dx for any ball B
centered at the origin.)
263 Notes Differentiable transformations in R r 317

(c) Let A ]0, 1[ be a set such that A = ([0, 1] \ A) = 1, where is Lebesgue measure on R. Set
D = A {x : x ]0, 1[ \ A} [1, 1], and set (x) = |x| for x D. ShowR that is injective, that is
differentiable relative to its domain everywhere in D, and that [D] < D |0 (x)|dx.

(d) Let : D R r be a function differentiable relative to D at each point of D R r , and suppose that
for each x DS there is a non-singular derivative T (x) of at x; set J(x) = | det T (x)|. Show that D is
expressible as kN Dk where Dk = D Dk and Dk is injective for each k.
R 1
R 1
> (e) (i) Show that for any Lebesgue measurable E R, t R \ {0}, tE |u| du = E |u| du. (ii) For t R,
t
R 1
R 1
u R \ {0} set (t, u) = ( u , u). Show that [E] |tu| d(t, u) = E |tu| d(t, u) for any Lebesgue measurable
E R2.

(f ) Define : R 3 R 3 by setting
(, , ) = ( sin cos , sin sin , cos ).
Show that det 0 (, , ) = 2 sin .
R 2
(g) Show that if k = 2l + 1 is odd, then 0
xk ex /2
dx = 2l l!. (Compare 252Xh.)
R 1
263Y Further exercises (a) Define a measure on R by setting E = E |x| dx for Lebesgue measur-
1
R x
able sets E R. For f , g L () set (f g)(x)
R = f ( t )g(t)(dt) whenever
R this is defined in R. (i) Show
that f g = g f L1 (). (ii) Show that h(x)(f g)(x)(dx) = h(xy)f (x)f (y)(dx)(dy) for every
h L (). (iii) Show that f (g h) = (f g) h for every h L1 (). (Hint: 263Xe.)

(b) Let E R 2 be a measurable set such that lim sup 12 2 (E B(0, )) > 0, writing 2 for
Lebesgue measure on R 2 . Show that there is some ], ] suchR that 1 E = , where E = { :

0, ( cos , sin ) E}. (Hint: show that 12 2 (E B(0, )) min( 12 , 1 1 E )d.) Generalize to
higher dimensions and to functions other than E.

(c) Let E R r be a measurable set, and : E R r a function differentiable relative to its domain,
with a derivative T (x), at each point x of E; set J(x) = | det T (x)|. Show that for any integrable function
g defined on [E],
R R
g(y)#(1 [{y}])dy = E J(x)g((x))dx
(Hint: 263I.)

(d) Find a proof of 263I based on the ideas of 225. (Hint: 225Xg.)

263 Notes and comments Yet again, approaching 263D, I find myself having to choose between giving
an accessible, relatively weak result and making the extra effort to set out a theorem which is somewhere
near the natural boundary of what is achievable within the concepts being developed in this volume; and,
as usual, I go for the more powerful form. There are three basic sources of difficulty: (i) the fact that we
are dealing with more than one dimension; (ii) the fact that we are dealing with irregular domains; (iii) the
fact that we are dealing with arbitrary integrable functions. I do not think I need to apologise for (iii) in
a book on measure theory. Concerning (ii), it is quite true that the principal applications of these results
are to cases in which the transformation is differentiable everywhere, with continuous derivative, and the
set D has negligible boundary; and in these cases there are substantial simplifications available mostly
because the sets Dm of the proof of 263D can be taken to be cubes. Nevertheless, I think any form of the
result which makes such assumptions is deeply unsatisfactory at this level, being an awkward compromise
between ideas natural to the Riemann integral and those natural to the Lebesgue integral. Concerning (i),
it might even have been right to lay out the whole argument for the case r = 1 before proceeding to the
general case, as I did in 114-115, because the one-dimensional case is already important and interesting;
and if you find the work above difficult which it is and your immediate interests are in one-dimensional
integration by substitution, then I think you might find it worth your time to reproduce the r = 1 argument
318 Change of variable in the integral 263 Notes

yourself, up to a proof of 263I. In fact the biggest difference is in 263A, which becomes nearly trivial; the
work of 262M and 263C becomes more readable, because all the matrices turn into scalars and we can drop
the word determinant, but I do not think we can dispense with any of the ideas, at least if we wish to
obtain 263D as stated. (But see 263Yd.)
I found myself insisting, in the last paragraph, that a distinction can be made between ideas natural
to the Riemann integral and those natural to the Lebesgue integral. We are approaching deep questions
here, like what are books on measure theory for?, which I do not think can be answered without some
possibly unconscious reference to the question what is mathematics for?. I do of course want to present
here some of the wonderful general theorems which arise in the Lebesgue theory. But more important than
any specific theorem is a general idea of what can be proved by these methods. It is the essence of modern
measure theory that continuity does not matter, or, if you prefer, that measurable functions are in some
sense so nearly continuous that we do not have to add hypotheses of continuity in our theorems. Now this
is in a sense a great liberation, and the Lebesgue integral is now the standard one. But you must not regard
the Riemann integral as outdated. The intuitions on which it is founded for instance, that the surface of
a solid body has zero volume remain of great value in their proper context, which certainly includes the
study of differentiable functions with continuous derivatives. What I am saying here is that I believe we
can use these intuitions best if we maintain a division, a flexible and permeable one, of course, between the
ideas of the two theories; and that when transferring a theorem from one side of the boundary to the other
we should do so whole-heartedly, seeking to express the full power of the methods we are using.
I have already said that the essential difference between the one-dimensional and multi-dimensional
cases lies in 263A, where the Jacobian J = | det T | enters the argument. Shorn of the technical devices
necessary to deal with arbitrary Lebesgue measurable sets, this amounts to a calculation of the volume of
the parallelepiped T [I] where I is the interval [0, 1[. I have dealt with this by a little bit of algebra, saying
that the result is essentially obvious if T is diagonal, whereas if T is an isometry it follows from the fact
that the unit ball is left invariant; and the algebra comes in to express an arbitrary matrix as a product
of diagonal and orthogonal matrices (2A6C). It is also plain from 261F that Lebesgue measure must be
rotation-invariant as well as translation-invariant; that is to say, it is invariant under all isometries. Another
way of looking at this will appear in the next section.
I feel myself that the centre of the argument for 263D is in the lemma 263C. This is where we turn
the exact result for linear operators into an approximate result for almost-linear functions; and the whole
point of differentiability is that a differentiable function is well approximated, in a neighbourhood of any
point of its domain, by a linear operator. The lemma involves two rather different ideas. To show that
[D] (J + ) D, we look first at balls and then use Vitalis theorem P to see that D is economically
covered by balls, so that an upper bound for [D] in terms of a sum BI0 [D B] is adequate. To
obtain a lower bound, we need to reverse the argument by looking at = 1 , which involves checking first
that is invertible, and then that is appropriately linked to T 1 . I have written out exact formulae for 0 , 20
and so on, but this is only in case you do not trust your intuition; the fact that k1 (u)1 (v)T 1 (uv)k
is small compared with kuvk is pretty clearly a consequence of the hypothesis that k(x)(y)T (xy)k
is small compared with kx yk.
The argument of 263D itself is now a matter of breaking the set D up into appropriate pieces on each of
which is sufficiently nearly linear for 263C to apply, so that
P P
[D] m=0 [Dm ] m=0 (Jm + ) Dm .
With a little care (taken in 263C, with its condition (i)),
P we can also ensure
R that the Jacobian J is well
approximated by Jm almost everywhere on Dm , so that m=0 Jm Dm l D J(x)dx.
These ideas, joined with the results of 262, bring us to the point
R
E
J d = [E]
when is injective and E D is measurable. We need a final trick, involving Borel sets, to translate this
into
R
1 [F ]
J d = F
whenever F [D] is measurable, which is what is needed for the application of 235L.
I hope that you long ago saw, and were delighted by, the device in 263G. Once again, this is not really
264B Hausdorff measures 319

Lebesgue integration; but I include it just to show that the machinery of this chapter can be turned to deal
with the classical results, and that indeed we have a tiny profit from our labour, in that no apology need
be made for the boundary of the set D into which the polar coordinate system maps the plane. I have
already given the actual result as an exercise in 252Xh. That involved (if you chase through the references)
a one-dimensional substitution (performed in 225Xj), Fubinis theorem and an application of the formulae
of 235; that is to say, very much the same elements as those used above, though in a different order. I could
present this with no mention of differentiation in higher dimensions because the first change of variable was
in one dimension, and the second (involving the function x 7 kxk, in 252Xh(i)) was of a particularly simple
type, so that a different method could be used to find the function J.
The abstract ideas to which this treatise is devoted do not, indeed, lead us to many particular examples
on which to practise the ideas of this section. The ones which do arise tend to be very straightforward, as
in 263G, 263Xa-263Xb and 263Xe. I mention the last because it provides a formula needed to discuss a
new type of convolution (263Ya). In effect, this depends on the multiplicative group R \ {0} in place of the
additive group R treated in 255. The formula x1 in the definition of is of course the derivative of ln x,
and ln is an isomorphism between (]0, [ , , ) and (R, +, Lebesgue measure).

264 Hausdorff measures


The next topic I wish to approach is the question of surface measure; a useful example to bear in mind
throughout this section and the next is the notion of area for regions on the sphere, but any other smoothly
curved two-dimensional surface in three-dimensional space will serve equally well. It is I think more than
plausible that our intuitive concepts of area for such surfaces should correspond to appropriate measures.
But the formalisation of this intuition is non-trivial, especially if we seek the generality that simple geometric
ideas lead us to; I mean, not contenting ourselves with arguments that depend on the special nature of the
sphere, for instance, to describe spherical surface area. I divide the problem into two parts. In this section
I will describe a construction which enables us to define the r-dimensional measure of an r-dimensional
surface among other things in s-dimensional space. In the next section I will set out the basic theorems
making it possible to calculate these measures effectively in the leading cases.

264A Definitions Let s 1 be an integer, and r > 0. (I am primarily concerned with integral r, but
will not insist on this until it becomes necessary, since there are some very interesting ideas which involve
non-integral dimension r.) For any A R s , > 0 set

X
r A = inf{ (diam An )r : hAn inN is a sequence of subsets of R s covering A,
n=0
diam An for every n N}.
It is convenient in this context to say that diam = 0. Now set
r A = sup>0 r A;
r is r-dimensional Hausdorff outer measure on R s .

264B Of course we must immediately check the following:


Lemma r , as defined in 264A, is always an outer measure.
proof You should be used to these arguments by now, but there is an extra step in this one, so I spell out
the details.
(a) Interpreting the diameter of the empty set as 0, we have r = 0 for every > 0, so r = 0.
(b) If A B R s , then every sequence covering B also covers A, so r A r B for every and
r A r B.
320 Change of variable in the integral 264B

s
(c) Let hAn inN be a sequence of subsets
P of R with union A, and take any a < r A. Then there is a
> 0 such that a r A. Now r A n=0 r (An ). PP Let > 0,Pand for each n N choose a sequence

hAnm imN of sets, covering A, with diam Anm for every m and m=0 (diam Anm )r r + 2n . Then
hAnm im,nN is a cover of A by countably many sets of diameter at most , so
P P P P
r A n=0 m=0 (diam Anm )r n=0 r An + 2n = 2 + n=0 r An .
As is arbitrary, we have the result. Q
Q
Accordingly
P P
a r A n=0 r An n=0 r An .

As a is arbitrary,
P
r A n=0 r An ;

as hAn inN is arbitrary, r is an outer measure.

264C Definition If s 1 is an integer, and r > 0, then Hausdorff r-dimensional measure on R s


is the measure Hr on R s defined by Caratheodorys method from the outer measure r of 264A-264B.

264D Remarks (a) It is important to note that the sets used in the definition of the r need not be
balls; even in R 2 not every set A can be covered by a ball of the same diameter as A.

(b) In the definitions above I require r > 0. It is sometimes appropriate to take H0 to be counting
measure. This is nearly the result of applying the formulae above with r = 0, but there can be difficulties
if we interpret them over-literally.

(c) All Hausdorff measures must be complete, because they are defined by Caratheodorys method (212A).
For r > 0, they are all atomless (264Yg). In terms of the other criteria of 211, however, they are very
ill-behaved; for instance, if r, s are integers and 1 r < s, then Hr on R s is not semi-finite. (I will give a
proof of this in 439 in Volume 4.) Nevertheless, they do have some striking properties which make them
reasonably tractable.

(d) In 264A, note that r A r0 A when 0 < 0 ; consequently, for instance, r A = limn r,2n A.
I have allowed arbitrary sets An in the covers, but it makes no difference if we restrict our attention to
covers consisting of open sets or of closed sets (264Xc).

264E Theorem Let s 1 be an integer, and r 0; let Hr be Hausdorff r-dimensional measure on


R s , and Hr its domain. Then every Borel subset of R s belongs to Hr .
proof This is trivial if r = 0; so suppose henceforth that r > 0.
(a) The first step is to note that if A, B are subsets of Rs and > 0 is such that kx yk for all
x A, y B, then r (A B) = r A + r B, where r is r-dimensional Hausdorff outer measure on R s . P
P Of
course r (A B) r A + r B, because r is an outer measure. For the reverse inequality, we may suppose
that r (A B) < , so that r A and r B are both finite. Let > 0 and let 1 , 2 > 0 be such that
r A + r B r1 A + r2 B + .
, 2 , 21 ) > 0 and let hAn inN be
Set = min(1P a sequence of sets of diameter at most , covering A B,

and such that n=0 (diam An )r r (A B) + . Set
K = {n : An A 6= }, L = {n : An B 6= }.
Because
kx yk > diam An
S S
whenever x A, y B and n N, K L = ; and of course A nK Ak , B nL An . Consequently
264F Hausdorff measures 321

r A + r B + r1 A + r2 B
X X
+ (diam An )r + (diam An )r
nK nL
X
+ (diam An )r 2 + r (A B) 2 + r (A B).
n=0

As is arbitrary, r (A B) r A + r B, as required. Q
Q
(b) It follows that r A = r (A G) + r (A \ G) whenever A R s and G is open. P P As usual, it is
enough to consider the case r A < and to show that in this case r (A G) + r (A \ G) r A. Set
An = {x : x A, kx yk 2n for every y A \ G},

B0 = A0 , Bn = An \ An1 for n > 1.


S S
Observe that An An+1 for every n and nN An = nN Bn = A G. The point is that if m, n N and
n m + 2, and if x Bm and y Bn , then there is a z A \ G such that ky zk < 2n+1 2m1 , while
kx zk must be at least 2m , so kx yk kx zk ky zk 2m1 . It follows that for any k 0
Pk S
m=0 r B2m = r ( mk B2m ) r (A G) < ,

Pk S
m=0 r B2m+1 = r ( B2m+1 ) r (A G) < ,
mk
P
step). Consequently n=0 r Bn < .
(inducing on k, using (a) above for the inductive P

But now, given > 0, there is an m such that n=m r Bm , so that


X
r (A G) + r (A \ G) r Am + r Bn + r (A \ G)
n=m
+ r Am + r (A \ G) = + r (Am (A \ G))
m
(by (a) again, since kx yk 2 for x Am , y A \ G)
+ r A.

As is arbitrary, r (A G) + r (A \ G) r A, as required. Q
Q
(c) Part (b) shows exactly that open sets belong to Hr . It follows at once that the Borel -algebra of
R s is included in Hr , as claimed.

264F Proposition Let s 1 be an integer, and r > 0; let r be r-dimensional Hausdorff outer measure
on R s , and write Hr for r-dimensional Hausdorff measure on R s , Hr for its domain. Then
(a) for every A R s there is a Borel set E A such that Hr E = r A;
(b) r = Hr , the outer measure defined from Hr ;
(c) if E Hr is expressible as a countable union of sets of finite measure, there are Borel sets E 0 , E 00
such that E 0 E E 00 and Hr (E 00 \ E 0 ) = 0.
proof (a) If r A = this is trivial take E = Rs . Otherwise, for
Peach n N choose a sequence hAnm imN
of sets of diameter at most 2n , covering A, and such that m=0 (diam A nm )r
r,2n A + 2n . Set
T S s
Fnm = Anm , E = nN mN Fnm ; then E is a Borel set in R . Of course
T S T S
A nN mN Amn nN mN Fnm = E.
For any n N,
diam Fnm = diam Anm 2n for every m N,
P r
P r n
m=0 (diam Fnm ) = m=0 (diam Anm ) r,2n A + 2 ,
so
322 Change of variable in the integral 264F

r,2n E r,2n A + 2n .
Letting n ,
r E = limn r,2n E limn r,2n A + 2n = r A;
of course it follows that r A = r E, because A E. Now by 264E we know that E Hr , so we can write
Hr E in place of r E.

(b) This follows at once, because (writing for the domain of Hr ) we have
Hr A = inf{Hr E : E , A E} = inf{r E : E , A E} r A
for every A R s . On the other hand, if A R s , we have a Borel set E A such that r A = r E, so that
Hr A Hr E = r A.

(c)(i) Suppose first that Hr E < . By (a), there are Borel sets E 00 E, H E 00 \ E such that
Hr E 00 = r E,
Hr H = r (E 00 \ E) = Hr (E 00 \ E) = Hr E 00 Hr E = Hr E 00 r E = 0.
So setting E 0 = E 00 \ H, we obtain a Borel set included in E, and
Hr (E 00 \ E 0 ) Hr H = 0.
S 0 00
(ii) For the general case, express E as nN En where Hr En < for
S each n; take Borel
S sets00 En , En
0 00 00 0 0 0 00
such that En En En and Hr (En \ En ) = 0 for each n; and set E = nN En , E = nN En .

264G Lipschitz functions The definition of Hausdorff measure is exactly adapted to the following
result, corresponding to 262D.

Proposition Let m, s 1 be integers, and : D R s a -Lipschitz function, where D is a subset of R m .


Then for any A D and r 0,
Hr ([A]) r Hr A
for every A D, writing Hr for r-dimensional Hausdorff outer measure on either R m or R s .

proof (a) The case r = 0 is trivial, since then r = 1 and Hr A = H0 A = #(A) if A is finite, otherwise,
and #([A]) #(A).

(b) If r > 0, then take any > 0. Set = /(1 + ) and consider r : PR m [0, ], defined as in
264A. We know from 264Fb that
Hr A = r A r A,
P r
so there is a sequence hA
S n inN of sets, all of diameter at most , covering A, with n=0 (diam An )
Hr A + . Now [A] nN [An D] and
diam [An D] diam An
for every n. Consequently
P r
P
r ([A]) n=0 (diam [An ]) n=0 r (diam An )r r (Hr A + ),
and
Hr ([A]) = lim0 r ([A]) r Hr A,
as claimed.

264H The next step is to relate r-dimensional Hausdorff measure on R r to Lebesgue measure on R r .
The basic fact we need is the following, which is even more important for the idea in its proof than for the
result.
264H Hausdorff measures 323

Theorem Let r 1 be an integer, and A a bounded subset of R r ; write r for Lebesgue measure on R r
and d = diam A. Then
d
r (A) r B(0, ) = 2r r dr ,
2

where B(0, d2 ) is the ball with centre 0 and diameter d, so that B(0, 1) is the unit ball in R r , and has
measure

1 k
r = if r = 2k is even,
k!
22k+1 k! k
= if r = 2k + 1 is odd.
(2k+1)!

proof (a) For the calculation of r , see 252Q or 252Xh.


(b) The case r = 1 is elementary, for in this case A is included in an interval of length diam A, so that
1 A diam A. So henceforth let us suppose that r 2.
(c) For 1 i r let Si : R r R r be reflection in the ith coordinate, so that Si x = (1 , . . . , i1 , i , i+1 ,
. . . , r ) for every x = (1 , . . . , r ) R r . Let us say that a set C R r is symmetric in coordinates in
J, where J {1, . . . , r}, if Si [C] = C for i J. Now the centre of the argument is the following fact: if
C R is a bounded set which is symmetric in coordinates in J, where J is a proper subset of {1, . . . , r}, and
j {1, . . . , r} \ J, then there is a set D, symmetric in coordinates in J {j}, such that diam D diam C
and r C r D.
PP (i) Because Lebesgue measure is invariant under permutation of coordinates, it is enough to deal with
the case j = r. Start by writing F = C, so that diam F = diam C and r F r C. Note that because Si is
a homeomorphism for every i,
Si [F ] = Si [C] = Si [C] = C = F
for i J, and F is symmetric in coordinates in J.
For y = (1 , . . . , r1 ) R r1 , set
Fy = { : (1 , . . . , r1 , ) F }, f (y) = 1 Fy ,
where 1 is Lebesgue measure on R. Set
1
D = {(y, ) : y R r1 , || < f (y)} R r .
2

(ii) If H R r is measurable and H D, then, writing r1 for Lebesgue measure on R r1 , we have

Z
r H = 1 { : (y, ) H}r1 (dy)

(using 251M and 252D)


Z Z
1 { : (y, ) D}r1 (dy) = f (y)r1 (dy)
Z
= 1 { : (y, ) F }r1 (dy) = r F r C.

As H is arbitrary, r D r C.
(iii) The next step is to check that diam D diam C. If x, x0 D, express them as (y, r ) and (y 0 , r0 ).
Because F is a bounded closed set in R r , Fy and Fy0 are bounded closed subsets of R. Also both f (y) and
f (y 0 ) must be greater than 0, so that Fy , Fy0 are both non-empty. Consequently
= inf Fy , = sup Fy , 0 = inf Fy0 , 0 = sup Fy0
are all defined in R, and , Fy , while 0 and 0 belong to Fy0 . We have
324 Change of variable in the integral 264H

1 1
|r r0 | |r | + |r0 | < f (y) + f (y 0 )
2 2
1 1
= (1 Fy + 1 Fy0 ) ( + 0 0 ) max( 0 , 0 ).
2 2

So taking (, 0 ) to be one of (, 0 ) or (, 0 ), we can find Fy , 0 Fy0 such that | 0 | |r r0 |.


Now z = (y, ), z 0 = (y, 0 ) both belong to F , so
kx x0 k2 = ky y 0 k2 + |r r0 |2 ky y 0 k2 + | 0 |2 = kz z 0 k2 (diam F )2 ,
and kx x0 k diam F . As x, x0 are arbitrary, diam D diam F = diam C, as claimed.
(iv) Evidently Sr [D] = D. Moreover, if i J, then (interpreting Si as an operator on R r1 )
FSi (y) = Fy for every y R r1 ,
so f (Si (y)) = f (y) and, for R, y R r1 ,
(y, ) D || < 21 f (y) || < 12 f (Si (y)) (Si (y), ) D,
so that Si [D] = D. Thus D is symmetric in coordinates in J {r}. Q
Q
(d) The rest is easy. Starting from any bounded A R r , set A0 = A and construct inductively A1 , . . . , Ar
such that
d = diam A = diam A0 diam A1 . . . diam Ar ,

r A = r A0 . . . r Ar ,

Aj is symmetric in coordinates in {1, . . . , j} for every j r.


At the end, we have Ar symmetric in coordinates in {1, . . . , r}. But this means that if x Ar then
x = S1 S2 . . . Sr x Ar ,
so that
1 1 d
kxk = kx (x)k diam Ar .
2 2 2

Thus Ar B(0, d2 ), and


d
r A r Ar r B(0, ),
2
as claimed.

264I Theorem Let r 1 be an integer; let be Lebesgue measure on R r , and let Hr be r-dimensional
Hausdorff measure on R r . Then and Hr have the same measurable sets and
E = 2r r Hr E
for every measurable set E R r , where r = B(0, 1), so that the normalizing factor is

1
2r r = k if r = 2k is even,
22k k!
k!
= k if r = 2k + 1 is odd.
(2k+1)!

proof (a) Of course if B = B(x, ) is any ball of radius ,


2r r (diam B)r = r r = B.

(b) The point is that = 2r r Hr . P


P Let A R r .
(i) Let , >P
S 0. By 261F, there is a sequence hBn inN of balls, all of diameter at most , such that

A nN Bn and n=0 Bn A + . Now, defining r as in 264A,
*264J Hausdorff measures 325

P P
2r r r (A) 2r r n=0 (diam Bn )
r
= n=0 Bn A + .
Letting 0,
2r r Hr A A + .
As is arbitrary, 2r r Hr A A.
S
(ii) Let > 0. Then there is a sequence hAn inN of sets of diameter at most 1 such that A An
P nN
and n=0 (diam An )r r1 A + , so that
P P
A n=0 An n=0 2r r (diam An )r 2r r (r1 A + ) 2r r (Hr A + )
by 264H. As is arbitrary, A 2r r Hr A. Q
Q
(c) Because , Hr are the measures defined from their respective outer measures by Caratheodorys
method, it follows at once that = 2r r Hr in the strict sense required.

*264J The Cantor set I remarked in 264A that fractional dimensions r were of interest. I have no
space for these here, and they are off the main lines of this volume, but I will give one result for its intrinsic
interest.
Proposition Let C be the Cantor set in [0, 1]. Set r = ln 2/ ln 3. Then the r-dimensional Hausdorff measure
of C is 1.
T
proof (a) Recall that C = nN Cn , where each Cn consists of 2n closed intervals of length 3n , and Cn+1
is obtained from Cn by deleting the middle (open) third of each interval of Cn . (See 134G.) Because C is
closed, Hr C is defined (264E). Note that 3r = 2.
(b) If > 0, take n such that 3n ; then C can be covered by 2n intervals of diameter 3n , so
r C 2n (3n )r = 1.
Consequently
Hr C = Hr C = lim0 r C 1.

(c) We need the following elementary fact: if , , 0 and max(, ) , then r + r ( + + )r .


P
P Because 0 < r 1,
R
7 ( + )r r = r 0 ( + )r1 d
is non-increasing for every 0. Consequently

( + + )r r r ( + + )r r r
( + + )r r r = r (3r 2) = 0,
as required. Q
Q
(d) Now suppose that I R is any interval, and m N; write jm (I) for the number of the intervals
composing Cm which are included in I. Then 2m jm (I) (diam I)r . PP If I does not meet Cm , this is
trivial. Otherwise, induce on
l = min{i : I meets only one of the intervals composing Cmi }.
If l = 0, so that I meets only one of the intervals comprosing Cm , then jm (I) 1, and if jm (I) = 1 then
diam I 3m so (diam I)r 2m ; thus the induction starts. For the inductive step to l > 1, let J be the
interval of Cml which meets I, and J 0 , J 00 the two intervals of Cml+1 included in J, so that I meets both
J 0 and J 00 , and
jm (I) = jm (I J) = jm (I J 0 ) + jm (I J 00 ).
By the inductive hypothesis,
(diam(I J 0 ))r + (diam(I J 00 ))r 2m jm (I J 0 ) + 2m jm (I J 00 ) = 2m jm (I).
On the other hand, by (c),
326 Change of variable in the integral *264J

(diam(I J 0 ))r + (diam(I J 00 ))r (diam(I J 0 ) + 3m+l1 + diam(I J 00 ))r


= (diam(I J))r (diam I)r
because J 0 , J 00 both have diameter at most 3(ml+1) , the length of the interval between them. Thus the
induction continues. Q Q
(e) Now suppose that > 0. Then there is a sequence hAn inN of sets, covering C, such that
P r
n=0 (diam An ) < Hr C + .
P
Take n > 0 such that n=0 (diam An + n )r Hr C + , and for each n take an open interval In An
of length at most diam An + n and with neither S endpoint belonging to C; this is possible because C does
not includeS any non-trivial interval. Now C nN In ; because C is compact, there is a k N such
that C nk In . Next, there is an m N such that no endpoint of any In , for n k, belongs to Cm .
Consequently each of the intervals composing Cm must be included in some In , and (in the terminology of
Pk
(d) above) n=0 jm (In ) 2m . Accordingly
Pk Pk P
1 n=0 2m jm (In ) n=0 (diam In )r n=0 (diam An + n )r Hr C + .
As is arbitrary, Hr C 1, as required.

*264K General metric spaces While this chapter deals exclusively with Euclidean spaces, readers
familiar with the general theory of metric spaces may find the nature of the theory clearer if they use the
language of metric spaces in the basic definitions and results. I therefore repeat the definition here, and
spell out the corresponding results in the exercises 264Yb-264Yl.
Let (X, ) be a metric space, and r > 0. For any A X, > 0 set

X
r A = inf{ (diam An )r : hAn inN is a sequence of subsets of X covering A,
n=0
diam An for every n N},
interpreting the diameter of the empty set as 0, and inf as , so that r A = if A cannot be covered
by a sequence of sets of diameter at most . Say that r A = sup>0 r A is the r-dimensional Hausdorff
outer measure of A, and take the measure Hr defined by Caratheodorys method from this outer measure
to be r-dimensional Hausdorff measure on X.

264X Basic exercises > (a) Show that all the functions r of 264A are outer measures. Show that in
that context, r (A) = 0 iff r (A) = 0, for and > 0 and any A R s .

(b) Let s 1 be an integer, and an outer measure on R s such that (A B) = A + B whenever A, B


are non-empty subsets of R s and inf xA,yB kx yk > 0. Show that every Borel subset of R s is measurable
for the measure defined from by Caratheodorys method.

> (c) Let s 1 be an integer and r > 0; define r as in 264A. Show that for any A R s , > 0,

X
r A = inf{ (diam Fn )r : hFn inN is a sequence of closed subsets of X
n=0
covering A, diam Fn for every n N}

X
= inf{ (diam Gn )r : hGn inN is a sequence of open subsets of X
n=0
covering A, diam Gn for every n N}.

>(d) Let s 1 be an integer and r 0; let Hr be r-dimensional Hausdorff measure on R s . Show that
for every A R s there is a G set (that is, a set expressible as the intersection of a sequence of open sets)
H A such that Hr H = Hr A. (Hint: use 264Xc.)
264Yk Hausdorff measures 327

> (e) Let s 1 be an integer, and 0 r < r0 . Show that if A R s and the r-dimensional Hausdorff
outer measure Hr A of A is finite, then Hr0 A must be zero.

264Y Further exercises (a) Let 11 be the outer measure on R 2 defined in 264A, with r = = 1,
and 11 the measure derived from 11 by Caratheodorys method, 11 its domain. Show that any set in 11
is either negligible or conegligible.

(b) Let (X, ) be a metric space and r 0. Show that if A X and Hr A < , then A is separable.

(c) Let (X, ) be a metric space, and an outer measure on X such that (A B) = A + B whenever
A, B are non-empty subsets of X and inf xA,yB (x, y) > 0. (Such an outer measure is called a metric
outer measure.) Show that every Borel subset of X is measurable for the measure defined from by
Caratheodorys method.

(d) Let (X, ) be a metric space and r > 0; define r as in 264K. Show that for any A X,

X
Hr A = sup inf{ (diam Fn )r : hFn inN is a sequence of closed subsets of X
>0 n=0
covering A, diam Fn for every n N}

X
= sup inf{ (diam Gn )r : hGn inN is a sequence of open subsets of X
>0 n=0
covering A, diam Gn for every n N}.

(e) Let (X, ) be a metric space and r 0; let Hr be r-dimensional Hausdorff measure on X. Show that
for every A X there is a G set H A such that Hr H = Hr A is the r-dimensional Hausdorff outer
measure of A.

(f ) Let (X, ) be a metric space and r 0; let Y be any subset of X, and give Y its induced metric
(Y )
Y . Show that the r-dimensional Hausdorff outer measure Hr on Y is just the restriction to PY of the
outer measure Hr on X. Show that if either Hr Y < or Hr Y is defined then r-dimensional Hausdorff
(Y )
measure Hr on Y is just the subspace measure on Y induced by the measure Hr on X.

(g) Let (X, ) be a metric space and r > 0. Show that r-dimensional Hausdorff measure on X is atomless.
(Hint: Let E dom Hr . (i) If E is not separable, there is an open set G such that E G and E \ G are
both non-separable, therefore both non-negligible. (ii) If there is an x E such that Hr (E B(x, )) > 0
for every > 0, then one of these sets has non-negligible complement in E. (iii) Otherwise, Hr E = 0.)

(h) Let (X, ) be a metric space and r 0; let Hr be r-dimensional Hausdorff measure on X. Show
that if Hr E < then Hr E = sup{Hr F : F E is closed and totally bounded}. (Hint: given > 0,
use 264Yd to find a closed totally bounded set F such that Hr (F \ E) = 0 and Hr (E \ F ) , and now
apply 264Ye to F \ E.)

(i) Let (X, ) be a complete metric space and r 0; let Hr be r-dimensional Hausdorff measure on X.
Show that if Hr E < then Hr E = sup{Hr F : F E is compact}.

(j) Let (X, ) and (Y, ) be metric spaces. If D X and : D Y is a function, then is -Lipschitz for
, if ((x), (x0 )) (x, x0 ) for every x, x0 D. (i) Show that in this case, if r 0, Hr ([A]) r Hr A
for every A D, writing Hr for r-dimensional Hausdorff outer measure on either X or Y . (ii) Show that
if X is complete and Hr E is defined and finite, then Hr ([E]) is defined. (Hint: 264Yi.)

(k) Let (X, ) be a metric space, and for r 0 let Hr be Hausdorff r-dimensional measure on X. Show
that there is a unique = (X) [0, ] such that Hr X = if r [0, [, 0 if r ], [.
328 Change of variable in the integral 264Yl

(l) Let (X, ) be a metric space and : I X a continuous function, where I R is an interval. Write
H1 for one-dimensional Hausdorff measure on X. Show that
Pn
H1 ([I]) sup{ i=1 ((ti ), (ti1 )) : t0 , . . . , tn I, t0 . . . tn },
the length of the curve , with equality if is injective.

(m) Set r = ln 2/ ln 3, as in 264J, and write Hr for r-dimensional Hausdorff measure on Pthe Cantor set

C. Let be the usual measure on {0, 1}N (254J). Define : {0, 1}N C by setting (x) = 23 n=0 3n x(n)
N N
for x {0, 1} . Show that is an isomorphism between ({0, 1} , ) and (C, Hr ).

(n) Set r = ln 2/ ln 3 and write Hr for r-dimensional Hausdorff measure on the Cantor set C. Let f :
[0, 1] [0, 1] be the Cantor function (134H) and let be Lebesgue measure on R. Show that f [E] = Hr E
for every E dom Hr , Hr (C f 1 [F ]) = F for every Lebesgue measurable set F [0, 1].

(o) Let (X, ) be a metric space and h : [0, [ [0, [ a non-decreasing function. For A X set

X
h A = sup inf{ h(diam An ) : hAn inN is a sequence of subsets of X
>0 n=0
covering A, diam An for every n N},
interpreting diam as 0, inf as as usual. Show that h is an outer measure on X. State and prove
theorems corresponding to 264E and 264F. Look through 264X and 264Y for further results which might
be generalizable, perhaps on the assumption that h is continuous on the right.

(p) Let (X, ) be a metric space. Let us say that if a < b in R and f P : [a, b] X is a function, then f
n
is absolutely continuous if for everyP > 0 there is a > 0 such that i=1 (f (ai ), f (bi )) whenever
n
a a0 b0 . . . an bn b and i=0 bi ai . Show that f : [a, b] X is absolutely continuous
iff it is continuous and of bounded variation (in the sense of 224Ye) and H1 f [A] = 0 whenever A [a, b] is
Lebesgue negligible, where H1 is 1-dimensional Hausdorff measure on X. (Compare 225M.) Show that in
this case H1 f [ [a, b] ] < .

264 Notes and comments In this section we have come to the next step in geometric measure theory.
I am taking this very slowly, because there are real difficulties in the subject, and for the purposes of this
volume we do not need to master very much of it. The idea here is to find a definition of r-dimensional
Lebesgue measure which will be geometric in the strict sense, that is, dependent only on the metric
structure of R r , and therefore applicable to sets which have a metric structure but no linear structure. As
has happened before, the definition of Hausdorff measure from an outer measure gives no problems the
only new idea in 264A-264C is that of using a supremum r = sup>0 r of outer measures and the difficult
part is proving that our new measure has any useful properties. Concerning the properties of Hausdorff
measure, there are two essential objectives; first, to check that these measures, in general, share a reasonable
proportion of the properties of Lebesgue measure; and second, to justify the term r-dimensional measure
by relating Hausdorff r-dimensional measure on R r to Lebesgue measure on R r .
As for the properties of general Hausdorff measures, we have to go rather carefully. I do not give counter-
examples here because they involve concepts which belong to Volumes 4 and 5 rather than this volume,
but I must warn you to expect the worst. However, we do at least have open sets measurable, so that all
Borel sets are measurable (264E). The outer measure of a set A can be defined in terms of the Borel sets
including A (264Fa), though not in general in terms of the open sets including A; but the measure of a
measurable set E is not necessarily the supremum of the measures of the Borel sets included in E, unless
E is of finite measure (264Fc). We do find that the outer measure r defined in 264A is the outer measure
defined from Hr (264Fb), so that the phrase r-dimensional Hausdorff outer measure is unambiguous. A
crucial property of Lebesgue measure is the fact that the measure of a measurable set E is the supremum of
the measures of the compact subsets of E; this is not generally shared by Hausdorff measures, but is valid for
sets E of finite measure in complete spaces (264Yi). Concerning subspaces, there are no problems with the
outer measures, and for sets of finite measure the subspace measures are also consistent (264Yf). Because
Hausdorff measure is defined in metric terms, it behaves regularly for Lipschitz maps (264G); one of the
265B Surface measures 329

most natural classes of functions to consider when studying metric spaces is that of 1-Lipschitz functions,
so that (in the language of 264G) Hr [A] Hr A for every A.
The second essential feature of Hausdorff measure, its relation with Lebesgue measure in the appropri-
ate dimension, is Theorem 264I. Because both Hausdorff measure and Lebesgue measure are translation-
invariant, this can be proved by relatively elementary means, except for the evaluation of the normalizing
r r
constant; all we need to know is that [0, 1[ = 1 and Hr [0, 1[ are both finite and non-zero, and this is
straightforward. (The arguments of part (a) of the proof of 261F are relevant.) For the purposes of this
chapter, we do not I think have to know the value of the constant; but I cannot leave it unsettled, and
therefore give Theorem 264H, the isodiametric inequality, to show that it is just the Lebesgue measure
of an r-dimensional ball of diameter 1, as one would hope and expect. The critical step in the argument
of 264H is in part (c) of the proof. This is called Steiner symmetrization; the idea is that given a set A,
we transform A through a series of steps, at each stage lowering, or at least not increasing, its diameter,
and raising, or at least not decreasing, its outer measure, progressively making A more symmetric, until
at the end we have a set which is sufficiently constrained to be amenable. The particular symmetrization
operation used in this proof is important enough; but the idea of progressive regularization of an object is
one of the most powerful methods in measure theory, and you should give all your attention to mastering
any example you encounter. In my experience, the idea is principally useful when seeking an inequality
involving disparate quantities in the present example, the diameter and volume of a set.
Of course it is awkward having two measures on R r , differing by a constant multiple, and for the purposes
of the next section it would actually have been a little more convenient to follow Federer 69 in using
normalized Hausdorff measure 2r r Hr . (For non-integral r, we could take r = r/2 /(1 + 2r ), as
suggested in 252Xh.) However, I believe this to be a minority position, and the striking example of Hausdorff
measure on the Cantor set (264J, 264Ym-264Yn) looks much better in the non-normalized version.
Hausdorff (ln 2/ ln 3)-dimensional measure on the Cantor set is of course but one, perhaps the easiest,
of a large class of examples. Because the Hausdorff r-dimensional outer measure of a set A, regarded as
a function of r, behaves dramatically (falling from to 0) at a certain critical value (A) (see 264Xe,
264Yk), it gives us a metric space invariant of A; (A) is the Hausdorff dimension of A. Evidently the
Hausdorff dimension of C is ln 2/ ln 3, while that of r-dimensional Euclidean space is r.

265 Surface measures


In this section I offer a new version of the arguments of 263, this time not with the intention of justifying
integration-by-substitution, but instead to give a practically effective method of computing the Hausdorff
r-dimensional measure of a smooth r-dimensional surface in an s-dimensional space. The basic case to
bear in mind is r = 2, s = 3, though any other combination which you can easily visualize will also be
a valuable aid to intuition. I give a fundamental theorem (265E) providing a formula from which we can
hope to calculate the r-dimensional measure of a surface in s-dimensional space which is parametrized by a
differentiable function, and work through some of the calculations in the case of the r-sphere (265F-265H).

265A Normalized Hausdorff measure As I remarked at the end of the last section, Hausdorff mea-
sure, as defined in 264A-264C, is not quite the most appropriate measure for our work here; so in this section
I will use normalized Hausdorff measure, meaning r = 2r r Hr , where Hr is r-dimensional Haus-
dorff measure (interpreted in whichever space is under consideration) and r = r B(0, 1) is the Lebesgue
measure of any ball of radius 1 in R r . It will be convenient to take 0 = 1. As shown in 264H-264I, this
normalization makes r on R r agree with Lebesgue measure r . Observe that of course r = 2r r Hr
(264Fb).

265B Linear subspaces Just as in 263, the first step is to deal with linear operators.
Theorem Suppose that r, s are integerswith 1 r s, and that T is a real s r matrix; regard T as a
linear operator from R r to R s . Set J = det T 0 T , where T 0 is the transpose of T . Write r for normalized
r-dimensional Hausdorff measure on R s , Tr for its domain, and r for Lebesgue measure on R r . Then
330 Change of variable in the integral 265B

r T [E] = Jr E
r
for every measurable set E R . If T is injective (that is, if J 6= 0), then
r F = Jr T 1 [F ]
whenever F Tr and F T [R r ].
proof The formula for J assumes that det T 0 T is non-negative, which is a fact not in evidence; but the
argument below will establish it adequately soon.
(a) Let V be the linear subspace of R s consisting of vectors y = (1 , . . . , s ) such that i = 0 whenever
r < i s. Let R be the r s matrix hij iir,js , where ij = 1 if i = j r, 0 otherwise; then the s r
matrix R0 may be regarded as a bijection from R r to V . Let W be an r-dimensional linear subspace of R s
including T [R r ], and let P be an orthogonal s s matrix such that P [W ] = V . Then S = RP T is an r r
matrix. We have R0 Ry = y for y V , so R0 RP T = P T and
S 0 S = T 0 P 0 R0 RP T = T 0 P 0 P T = T 0 T ;
accordingly
det T 0 T = det S 0 S = (det S)2 0
and J = | det S|. At the same time,
P 0 R0 S = P 0 R0 RP T = P 0 P T = T .
Observe that J = 0 iff S is not injective, that is, T is not injective.
(b) If we consider the s r matrix P 0 R0 as a map from R r to R s , we see that = P 0 R0 is an isometry
between R r and W , with inverse 1 = RP W . It follows that is an isomorphism between the measure
(r) (s) (r) (s)
spaces (Rr , Hr ) and (W, HrW ), where Hr is r-dimensional Hausdorff measure on R r and HrW is the
(s)
subspace measure on W induced by r-dimensional Hausdorff measure Hr on R s .
P (i) If A R r , A0 W ,
P
(s) (r) (r) (s)
Hr ([A]) Hr (A), Hr (1 [A0 ]) Hr (A0 ),
(s) (r)
using 264G twice. Thus Hr ([A]) = Hr (A) for every A R r .
(s) (s)
(ii) Now because W is closed, therefore in the domain of Hr (264E), the subspace measure HrW is
(s)
just the measure induced by Hr W by Caratheodorys method (214H(b-ii)). Because is an isomorphism
(r) (s) (r) (s)
between (Rr , Hr ) and (W, Hr W ), it is an isomorphism between (R r , Hr ) and (W, HrW ). Q
Q
(c) It follows that is also an isomorphism between the normalized versions (R r , r ) and (W, rW ),
writing rW for the subspace measure on W induced by r .
Now if E R r is Lebesgue measurable, we have r S[E] = Jr E, by 263A; so that
r T [E] = r (P 0 R0 [S[E]]) = r ([S[E]]) = r S[E] = Jr E.
If T is injective, then S = 1 T must also be injective, so that J 6= 0 and
r F = r (1 [F ]) = Jr (S 1 [1 [F ]]) = Jr T 1 [F ]
whenever F Tr and F W = T [R r ].

265C Corollary Under the conditions of 265B,


r T [A] = Jr A
for every A R r .
proof (a) If E is Lebesgue measurable and A E, then T [A] T [E], so
r T [A] r T [E] = Jr E;
as E is arbitrary, r T [A] Jr A.
265E Surface measures 331

(b) If J = 0 we can stop. If J 6= 0 then T is injective, so if F Tr and T [A] F we shall have


Jr A Jr T 1 [F W ] = r (F W ) r F ;
as F is arbitrary, Jr A r T [A].

265D I now proceed to the lemma corresponding to 263C.



Lemma Suppose that 1 r s and that T is an s r matrix; set J = det T 0 T , and suppose that J 6= 0.
Then for
any > 0 there is a = (T, ) > 0 such that
(i) | det S 0 S J| whenever S is an s r matrix and kS T k ;
(ii) whenever D R r is a bounded set and : D Rs is a function such that k(x) (y) T (x y)k
kx yk for all x, y D, then |r [D] Jr D| r D.
proof (a) Because det S 0 S
is a continuous function of the coefficients of S, 262Hb tells us that there must
be a 0 > 0 such that |J det S 0 S| whenever kS T k 0 .
(b) Because J 6= 0, T is injective, and there is an r s matrix T such that T T is the identity r r
matrix. Take > 0 such that 0 , kT k < 1, J(1 + kT k)r J + and 1 J 1 (1 kT k)r .
Let : D R s be such that k(x) (y) T (x y)k kx yk whenever x, y D. Set = T , so
that = T . Then for u, v T [D]
k(u) (v)k (1 + kT k)ku vk, ku vk (1 kT k)1 k(u) (v)k.
P Take x, y D such that u = T x, v = T y; of course x = T u, y = T v. Then
P

k(u) (v)k = k(T u) (T v)k = k(x) (y)k


kT (x y)k + kx yk
= ku vk + kT u T vk ku vk(1 + kT k).
Next,

ku vk = kT x T yk k(x) (y)k + kx yk
= k(u) (v)k + kT u T vk
k(u) (v)k + kT kku vk,
so that (1 kT k)ku vk k(u) (v)k and ku vk (1 kT k)1 k(u) (v)k. Q
Q
(c) Now from 264G and 265C we see that
r [D] = r [T [D]] (1 + kT k)r r T [D] = (1 + kT k)r Jr D (J + )r D,
and (provided J)

(J )r D = (1 J 1 )r T [D] (1 J 1 )(1 kT k)r r [T [D]]


r [T [D]] = r [D].
(Of course, if J, then surely (J )r D r [D].) Thus
(J )r D r [D] (J + )r D
as required, and we have an appropriate .

265E Theorem Suppose that 1 r s; write r for Lebesgue measure on R r , r for normalized
Hausdorff measure on R s , and Tr for the domain of r . Let D R r be any set, and : D R s a function
differentiable relative to its domain
p at each point of D. For each x D let T (x) be a derivative of at x
relative to D, and set J(x) = det T (x)0 T (x). Set D0 = {x : x D, J(x) > 0}. Then
(i) J : D [0,R[ is a measurable function;
(ii) r [D] D J(x)r (dx),
allowing as the value of the integral;
(iii) r [D \ D0 ] = 0.
332 Change of variable in the integral 265E

If D is Lebesgue measurable, then


(iv) [D] Tr .
If D is measurableRand is injective, then
(v) r [D] = D J dr ;
(vi) for any set E [D], E Tr iff 1 [E] D0 is Lebesgue measurable, and in this case
R R
r E = 1 [E] J(x)r (dx) = D J (1 [E])dr ;
(vii) for every real-valued function g defined on a subset of [D],
R R
[D]
g dr = D J g dr ,
if either integral is defined in [, ], provided we interpret J(x)g((x)) as zero when J(x) = 0 and g((x))
is undefined.
proof I seek to follow the line laid out in the proof of 263D.
(a) Just as in 263D, we know that J : D R is measurable, since J(x) is a continuous function of
the coefficients of T (x), all of which are measurable, by 262P.
S If D is Lebesgue measurable, then there is a
sequence hFn inN of compact subsets of D such thatSD \ nN Fn is r -negligible. Now [Fn ] is compact,
therefore belongs to Tr , for each n N. As for [D \ nN Fn ], this must be r -negligible by 264G, because
is a countable union of Lipschitz functions (262N). So
S S
[D] = nN [Fn ] [D \ nN Fn ] Tr .
This deals with (i) and (iv).
(b) For the moment, assume that D is bounded and that J(x) > 0 for every x D, and fix > 0. Let

Msr be the set of s r matrices T such that det T 0 T 6= 0, that is, the corresponding map T : R r R s is

injective. For T Msr take (T, ) > 0 as in 265D.

Take hDn inN , hTn inN as in 262M, with A = Msr , so that hDn inN is a disjoint cover of D by sets which
are relatively measurable in D, and each Tn is an s r matrix such that
kT (x) Tn k (Tn , ) whenever x Dn ,

k(x) (y) Tn (x y)k (Tn , )kx yk for all x, y Dn .


p
Then, setting Jn = det Tn0 Tn , we have
|J(x) Jn | for every x Dn ,

|r [Dn ] Jn r Dn | r Dn ,
by the choice of (Tn , ). So


X
r [D] r [Dn ]
n=0
S
(because [D] = nN [Dn ])

X
X
Jr Dn + r Dn r D + Jn r Dn
n=0 n=0
(because the Dn are disjoint and relatively measurable in D)
Z X

= r D + Jn Dn d
D n=0
Z Z
r D + J(x) + r (dx) = 2r D + J dr .
D D

If D is measurable and is injective, then all the Dn are Lebesgue measurable subsets of R r , so all the
[Dn ] are measured by r , and they are also disjoint. Accordingly
265E Surface measures 333

Z
X
J d Jn r Dn + r D
D n=0
X
(r [Dn ] + r Dn ) + r D = r [D] + 2r D.
n=0

Since is arbitrary, we get


R
r [D] D
J dr ,
and if D is measurable and is injective,
R
D
J dr r [D];
thus we have (ii) and (v), on the assumption that D is bounded and J > 0 everywhere on D.
(c) Just as in 263D, we can now relax the assumption that D is bounded by considering Bk = B(0, k)
R r ; provided J > 0 everywhere on D, we get
R R
r [D] = limk r [D Bk ] limk DBk J dr = D J dr ,
with equality if D is measurable and is injective.
(d) Now we find that r [D \ D0 ] = 0.
PP (i) Let ]0, 1]. Define : D R s+r by setting (x) = ((x), x), identifying R s+r with R s R r .
is differentiable relative to its domain at each point of D, with derivative T (x), being the (s + r) r
matrix in which the top s rows consist of the s r matrix T (x), and the bottom r rows are Ir , writing Ir
for the r r identity matrix. (Use 262Ib.) Now of course T (x), regarded as a map from R r to R s+r , is
injective, so
q p
J (x) = det T (x)0 T (x) = det(T (x)0 T (x) + 2 I) > 0.

We have lim0 J (x) = J(x) = 0 for x D \ D0 .


(ii) Express T (x) as hij (x)iis,jr for each x D. Set
Cm = {x : x D, kxk m, |ij (x)| m for all i s, j r}
for each m 1. For x Cm , all the coefficients of T (x) have moduli at most m; consequently (giving
the crudest and most immediately
p available inequalities) all the coefficients of T (x)0 T (x) have moduli at
most (r + s)m and J (x) r!(s + r)r mr . Consequently we can use Lebesgues Dominated Convergence
2

Theorem to see that


R
lim0
0 J dr = 0.
Cm \D

(iii) Let r be normalized Hausdorff r-dimensional measure on R s+r . Applying (b) of this proof to
Cm \ D0 , we see that
R
r [Cm \ D0 ] Cm \D0 J dr .
Now we have a natural map P : R s+r R s given by setting P (1 , . . . , s+r ) = (1 , . . . , s ), and P is
1-Lipschitz, so by 264G we have (allowing for the normalizing constants 2r r )
r P [A] r A
for every A R s+r . In particular,
R
r [Cm \ D0 ] = r P [ [Cm \ D0 ]] r [Cm \ D0 ] Cm \D0 J dr 0
S
as 0. But this means that r [Cm \ D0 ] = 0. As D = m1 Cm , r [D \ D0 ] = 0, as claimed. QQ
(d) This proves (iii) of the theorem. But of course this is enough to give (ii) and (v), because we must
have
334 Change of variable in the integral 265E
R R
r [D] = r [D0 ] D0
J dr = D
J dr ,
with equality if D is measurable and is injective.
(e) So let us turn to part (vi). Assume that D is measurable and that is injective.
(i) Suppose that E [D] belongs to Tr . Let
Hk = {x : x D, kxk k, J(x) k}
for each k; then each Hk is Lebesgue measurable, so (applying (iii) to Hk ) [Hk ] Tr , and
r [Hk ] kr Hk < .
Thus [D] can be covered by a sequence of sets of finite measure for r , which of course are of finite
measure for r-dimensional Hausdorff measure on R s . By 264Fc, there are Borel sets E1 , E2 R s such that
E1 E E2 and r (E2 \ E1 ) = 0. Now F1 = 1 [E1 ], F2 = 1 [E2 ] are Lebesgue measurable subsets of
D, and
R
F2 \F1
J dr = r [F2 \ F1 ] = r ([D] E2 \ E1 ) = 0.
Accordingly r (D0 (F2 \ F1 )) = 0. But as
D0 F1 D0 1 [E] D0 F2 ,
it follows that D0 1 [E] is measurable, and that
Z Z Z
J dr = J dr = J dr
1 [E] D 0 1 [E] D 0 F1
Z
= J dr = r [D F1 ] = r E1 = r E.
DF1
R
Moreover,
R J (1 [E]) = J (D0 1 [E]) is measurable, so we can write J (1 [E]) in place of
1 [E]
J.

(ii) If E [D] and D0 1 [E] is measurable, then of course


E = [D0 1 [E]] [(D \ D0 ) 1 [E]] Tr ,
because [G] Tr for every measurable G D and [D \ D0 ] is r -negligible.
(f ) Finally, (vii) follows at once from (vi), applying 235L to r and the subspace measure induced by r
on [D].

265F The surface of a sphere To show how these ideas can be applied to one of the basic cases, I
give the details of a method of describing spherical surface measure in s-dimensional space. Take r 1 and
s = r + 1. Write Sr for {z : z Rr+1 , kzk = 1}, the r-sphere. Then we have a parametrization r of Sr
given by setting

sin 1 sin 2 sin 3 . . . sin r
1
cos 1 sin 2 sin 3 . . . sin r
2
cos 2 sin 3 . . . sin r
...
r = ... .
...
cos r2 sin r1 sin r
...
cos r1 sin r
r
cos r
I choose this formulation because I wish to use an inductive argument based on the fact that

x sin r (x)
r+1 =
cos
for x R r , R. Every r is differentiable, by 262Id. If we set
265G Surface measures 335

Dr = {x : 1 ], ] , 2 , . . . , r [0, ],
if j {0, } then i = 0 for i < j},
then it is easy to check that Dr is a Borel subset of Rr and that r Dr is a bijection between Dr and Sr .
Now let Tr (x) be the (r + 1) r matrix 0r (x). Then

x sin Tr (x) cos r (x)
Tr+1 = .
0 sin
So

x 0 x sin2 Tr (x)0 Tr (x) sin cos Tr (x)0 r (x)
(Tr+1 ) Tr+1 = .
cos sin r (x)0 Tr (x) cos2 r (x)0 r (x) + sin2
But of course r (x)0 r (x) = kr (x)k2 = 1 for every x, and (differentiating with respect to each coordinate
of x, if you wish) Tr (x)0 r (x) = 0, r (x)0 Tr (x) = 0. So we get
2
x 0 x sin Tr (x)0 Tr (x) 0
(Tr+1 ) Tr+1 = ,
0 1
p
and writing Jr (x) = det Tr (x)0 Tr (x),

x
Jr+1 = | sinr |Jr (x).

At this point we induce on r to see that
Jr (x) = | sinr1 r sinr2 r1 . . . sin 2 |
(since of course the induction starts with the case r = 1,

sin x cos x
1 (x) = , T1 (x) = , T1 (x)0 T1 (x) = 1, J1 (x) = 1).
cos x sin x

To find the surface measure of Sr , we need to calculate

Z Z Z Z
Jr dr = ... sinr1 r . . . sin 2 d1 d2 . . . dr
Dr 0 0
r Z
Y Y Z /2
r1
= 2 sink1 t dt = 2 cosk t dt
k=2 0 k=1 /2


(substituting 2 t for t). But in the language of 252Q, this is just
Qr1
2 k=1 Ik = 2r1 ,
where r1 is the volume of the unit ball of R r1 (interpreting 0 as 1, if you like).

265G The surface area of a sphere can also be calculated through the following result.
Theorem Let r+1 be Lebesgue measure on R r+1 , and r normalized r-dimensional Hausdorff measure on
R r+1 . If f is a locally r+1 -integrable real-valued function, y R r+1 and > 0,
R RR
B(y,)
f dr+1 = 0 B(y,t) f dr dt,
R
where I write B(y, s) for the sphere {x : kx yk = s} and the integral . . . dt is to be taken with respect
to Lebesgue measure on R.
proof Take any differentiable function : R r Sr with a Borel set F R r such that F is a bijection
between F and Sr ; such a pair (, F ) is described in 265F. Define : R r R R r+1 by setting (z, t) = y +
336 Change of variable in the integral 265G

t(z); then is differentiable and F ]0, ] is a bijection between F ]0, ] and B(y, )\{y}. For t ]0, ],
z R r set t (z) = (z, t); then t F is a bijection between F and the sphere {x : kx yk = t} = B(y, t).
The derivative of at z is an (r + 1) r matrix T1 (z) say, and the derivative Tt (z) of t at z is just
tT1 (z); also the derivative of at (z, t) is the the (r + 1) (r + 1) matrix T (z, t) = ( tT1 (z) (z) ), where
(z) is interpreted as a column vector. If we set
p
Jt (z) = det Tt (z)0 Tt (z), J(z, t) = | det T (z, t)|,
then

tT1 (z)0
J(z, t)2 = det T (z, t)0 T (z, t) = det ( tT1 z
(z) )
(z)0
2
t T1 (z)0 T1 (z) 0
= det = Jt (z)2 ,
0 1
because when we come to calculate the (i, r + 1)-coefficient of T (z, t)0 T (z, t), for 1 i r, it is
Pr+1 j t Pr+1
j=1 t (z)j (z) = ( j=1 j (z)2 ) = 0,
i 2 i
Pr+1
where j is the jth coordinate of ; while the (r+1, r+1)-coefficient of T (z, t)0 T (z, t) is just j=1 j (z)2 = 1.
So in fact J(z, t) = Jt (z) for all z R r , t > 0.
Now, given f L1 (r+1 ), we can calculate

Z Z
f dr+1 = f dr+1
B(y,) B(y,)\{y}
Z
= f ((z, t))J(z, t)r+1 (d(z, t))
F ]0,]
(by 263D)
Z Z
= f (t (z))Jt (z)r (dz)dt
0 F
(where r is Lebesgue measure on R r , by Fubinis theorem, 252B)
Z Z
= f dr dt
0 B(y,t)

by 265E(vii).

265H Corollary If r is normalized r-dimensional Hausdorff measure on R r+1 , then r Sr = (r +1)r+1 .


proof In 265G, take y = 0, = 1, and f = B(0, 1); then
R R1 R1 1
r+1 = f dr+1 = 0 r (B(0, t)dt = 0 tr r Sr dt = r Sr
r+1

applying 264G to the maps x 7 tx, x 7 1t x from R r+1 to itself to see that r (B(0, t)) = tr r Sr for t > 0.

265X Basic exercises (a) Let r 1, and let Sr () = {z : z R r+1 , kzk = } be the r-sphere of
radius . Show that r Sr () = 2r1 r = (r + 1)r+1 r for every 0.

> (b) Let r 1, and for a [1, 1] set Ca = {z : z R r+1 , kzk = 1, 1 a}, writing z = (1 , . . . , r+1 )
as usual. Show that
R arccos a r1
r Ca = rr 0 sin t dt.

> (c) Again write Ca = {z : z Sr , 1 a}, where Sr R r+1 is the unit sphere. Show that, for any
r Sr Pr+1 R
a ]0, 1], r Ca . (Hint: calculate i=1 Sr ki k2 r (dx).)
2(r+1)a
2
265 Notes Surface measures 337

> (d) Let : ]0, 1[ R r be an injective differentiable function. Show that the length or one-dimensional
R1
Hausdorff measure of [ ]0, 1[ ] is just 0 k0 (t)kdt.
(e) (i) Show that if I is the identity r r matrix and z R r , then det(I + zz 0 ) = 1 + kzk2 . (Hint: induce
on r.) (ii) Write Ur1 for the open unit ball in R r1 , where r 2. Define : Ur1 R Sr by setting

x
x
= (x) cos ,

(x) sin
p
where (x) = 1 kxk2 . Show that
0 1 0
!
0 x 0 x I+ xx 0
= (x)2 ,
0 (x)2

x
so that J = 1 for all x Ur1 , R. (iii) Hence show that the normalized r-dimensional Hausdorff

Pr1 2
measure of {y : y S r , i=1 i < 1} is just 2r1 , where r1 is the Lebesgue measure of Ur1 . (iv)
z
By considering z = 0 for z Sr2 , or otherwise, show that the normalized r-dimensional Hausdorff
0
measure of Sr is 2r1 . (v) Setting Ca = {z : z R r+1 , kzk = 1, r a}, as in 265Xb and 265Xc, show
that r Ca = 2r1 {x : x R r1 , kxk 1, 1 a} for every a [1, 1].

265Y Further exercises (a) Take a < b in R. (i) Show that : [a, b] R r is absolutely continuous
in the sense of 264Yp iff all its coordinates i : [a, b] R, for i r, are absolutely continuous in the sense
of 225. (ii) Let : [a, b] R r be a continuous R function, and set F = {x : x ]a, b[ , is differentiable
at x}. Show that is absolutely continuous iff F k0 (x)kdx is finite and 1 ([[a, b] \ F ]) = 0, where 1 is
normalized Hausdorff one-dimensional measure
R on R r . (Hint: 225K.) (iii) Show that if : [a, b] R r is
0
absolutely continuous then 1 ([D]) D k (x)kdx for every D [a, b], with equality if D is measurable
and D is injective.

265 Notes and comments The proof of 265B seems to call on most of the second half of the alphabet. The
idea is supposed to be straightforward enough. Because T [R r ] has dimension at most r, it can be rotated
by an orthogonal transformation P into a subspace of the canonical r-dimensional subspace V , which is a
natural copy of R r ; the matrix R represents the copying process from V to R r , and or P 0 R0 is a copy
of R r onto a subspace including T [Rr ]. All this copying back and forth is designed to turn T into a linear
operator S : R r R r to which we can apply 263A, and part (b) of the proof is the check that we are
copying the measures as well as the linear structures.
In 265D-265E I have tried to follow 263C-263D as closely as possible. In fact only one new idea is needed.
When s = r, we have a special argument available to show that r [D] Jr D + r D (in the language
of 263C) which applies whether or not J = 0. When s > r, this approach fails, because we can no longer
approximate r T [B] by r G where G T [B] is open. (See part (b-i) of the proof of 263C.) I therefore turn
to a different argument, valid only when J > 0, and accordingly have to find a separate method to show
that {(x) : x D, J(x) = 0} is r -negligible. Since we are working without restrictions on the dimensions
r, s except that r s, we can use the trick of approximating : D R s by : D R s+r , as in part (d)
of the proof of 265E.
I give three methods by which the area of the r-sphere can be calculated; a bare-hands approach (265F),
the surrounding-cylinder method (265Xe) and an important repeated-integral theorem (265G). The first two
provide formulae for the area of a cap (265Xb, 265Xe(v)). The surrounding-cylinder method is attractive
because the Jacobian comes out to be 1, that is, we have an inverse-measure-preserving function. I note that
despite having developed a technique which allows irregular domains, I am still forced by the singularity
in the function of 265Xe to take the sphere in two bites. Theorem 265G is a special case of the Coarea
Theorem (Evans & Gariepy 92, 3.4; Federer 69, 3.2.12).
For the next step in the geometric theory of measures on Euclidean space, see Chapter 47 in Volume 4.
338 Probability theory

Chapter 27
Probability theory
Lebesgue created his theory of integration in response to a number of problems in real analysis, and
all his life seems to have thought of it as a tool for use in geometry and calculus (Lebesgue 72, vols. 1
and 2). Remarkably, it turned out, when suitably adapted, to provide a solid foundation for probability
theory. The development of this approach is generally associated with the name of Kolmogorov. It has so
come to dominate modern abstract probability theory that many authors ignore all other methods. I do not
propose to commit myself to any view on whether -additive measures are the only way to give a rigorous
foundation to probability theory, or whether they are adequate to deal with all probabilistic ideas; there are
some serious philosophical questions here, since probability theory, at least in its applied aspects, seeks to
help us to understand the material world outside mathematics. But from my position as a measure theorist,
it is incontrovertible that probability theory is among the central applications of the concepts and theorems
of measure theory, and is one of the most vital sources of new ideas; and that every measure theorist must
be alert to the intuitions which probabilistic methods can provide.
I have written the preceding paragraph in terms suggesting that probability theory is somehow distin-
guishable from the rest of measure theory; this is another point on which I should prefer not to put forward
any opinion as definitive. But undoubtedly there is a distinction, rather deeper than the elementary point
that probability deals (almost) exclusively with spaces of measure 1. M.Loeve argues persuasively (Loeve
77, 10.2) that the essence of probability theory is the artificial nature of the probability spaces themselves.
In measure theory, when we wish to integrate a function, we usually feel that we have a proper function
with a domain and values. In probability theory, when we take the expectation of a random variable, the
variable is an observable or the result of an experiment; we are generally uncertain, or ignorant, or indif-
ferent concerning the factors underlying the variable. Let me give an example from the theorems below. In
the proof of the Central Limit Theorem (274F), I find that I need an auxiliary list Z0 , . . . , Zn of random
variables, independent of each other and of the original sequence X0 , . . . , Xn . I create such a sequence
by taking a product space 0 , and writing Xi0 (, 0 ) = Xi (), while the Zi are functions of 0 . Now
the difference between the Xi and the Xi0 is of a type which a well-trained analyst would ordinarily take
seriously. We do not think that the function x 7 x2 : [0, 1] [0, 1] is the same thing as the function
(x1 , x2 ) 7 x21 : [0, 1]2 [0, 1]. But a probabilist is likely to feel that it is positively pedantic to start writing
Xi0 instead of Xi . He did not believe in the space in the first place, and if it turns out to be inadequate
for his intuition he enlarges it without a qualm. Loeve calls probability spaces fictions, inventions of the
imagination in Larousses words; they are necessary in the models Kolmogorov has taught us to use, but
we have a vast amount of freedom in choosing them, and in their essence they are nothing so definite as a
set with points.
A probability space, therefore, is somehow a more shadowy entity in probability theory than it is in
measure theory. The important objects in probability theory are random variables and distributions, partic-
ularly joint distributions. In this volume I shall deal exclusively with random variables which can be thought
of as taking values in some power of R; but this is not the central point. What is vital is that somehow
the codomain, the potential set of values, of a random variable, is much better defined than its domain.
Consequently our attention is focused not on any features of the artificial space which it is convenient to
use as the underlying probability space I write underlying, though it is the most superficial and easily
changed aspect of the model but on the distribution on the codomain induced by the random variable.
Thus the Central Limit Theorem, which speaks only of distributions, is actually more important in applied
probability than the Strong Law of Large Numbers, which claims to tell us what a long-term average will
almost certainly be.
W.Feller (Feller 66) goes even farther than Loeve, and as far as possible works entirely with distri-
butions, setting up machinery which enables him to go for long stretches without mentioning probability
spaces at all. I make no attempt to emulate him. But the approach is instructive and faithful to the essence
of the subject.
Probability theory includes more mathematics than can easily be encompassed in a lifetime, and I have
selected for this introductory chapter the two limit theorems I have already mentioned, the Strong Law of
Large Numbers and the Central Limit Theorem, together with some material on martingales (275-276).
They illustrate not only the special character of probability theory so that you will be able to form your
271Ad Distributions 339

own judgement on the remarks above but also some of its chief contributions to pure measure theory,
the concepts of independence and conditional expectation.

271 Distributions
I start this chapter with a discussion of probability distributions, the probability measures on R n defined
by families (X1 , . . . , Xn ) of random variables. I give the basic results describing the circumstances under
which two distributions are equal (271G), integration with respect to a distribution (271E), and probability
density functions (271I-271K).

271A Notation I have just spent some paragraphs on an attempt to describe the essential difference
between probability theory and measure theory. But there is a quicker test by which you may discover
whether your author is a measure theorist or a probabilist: open R any page, and look for the phrases mea-
surable function and random variable, and the formulae f d and E(X). The first member of each
pair will enable you to diagnose measure and the second probability, with little danger of error. So far
in this treatise I have firmly used measure theorists terminology, with a few individual quirks. But in a
chapter on probability theory I find that measure-theoretic notation, while perfectly adequate in a formal
sense, does such violence to the familiar formulations as to render them unnatural. Moreover, you must
surely at some point if you have not already done so become familiar with probabilists language. So in
this chapter I will make a substantial step in that direction. Happily, I think that this can be done without
setting up any direct conflicts, so that I shall be able, in later volumes, to call upon this work in whichever
notation then seems appropriate, without needing to re-formulate it.
(a) So let (, , ) be a probability space. I take the opportunity given by a new phrase to make a
technical move. A real-valued random variable on will be a member of L0 (), as defined in 241A;
that is, a real-valued function X defined on a conegligible subset of such that X is measurable with respect
to the completion of , or, if you prefer, such that XE is -measurable for some conegligible set E .
R
(b) If X is a real-valued random variable on a probability space (, , ), write E(X) = X d if this
exists in [, ] in the sense of Chapter 12 and 133. I will sometimes write X has a finite expectation
in place of X is integrable. Thus 133A says that E(X + Y ) = E(X) + E(Y ) whenever E(X) and E(Y )
and their sum are defined in [, ], and 122P becomes a real-valued random variable X has a finite
expectation iff E(|X|) < .
In this case I will call E(X) the mean or expectation of X.

(c) If X is a real-valued random variable with finite expectation, I will write


Var(X) = E(X E(X))2 = E(X 2 2E(X)X + E(X)2 ) = E(X 2 ) (E(X))2 ,
the variance of X. (Note that this formula shows that E(X)2 E(X 2 ); compare 244Xe(i).) Var(X) is
finite iff E(X 2 ) < , that is, iff X L2 () (244A). In particular, X + Y and cX have finite variance
whenever X and Y do and c R.

(d) I shall allow myself to use such formulae as


Pr(X > a), Pr(X Y X + ),
where X and Y are random variables on the same probability space (, , ), to mean respectively
{ : dom X, X() > a},

{ : dom X dom Y, X() Y () X() + },


writing for the completion of as usual. There are two points to note here. First, Pr depends on , not
on ; in effect, the notation automatically directs us to complete the probability space (, , ). I could, of
course, equally well write
340 Probability theory 271Ad

Pr(X 2 + Y 2 > 1) = { : dom X dom Y, X()2 + Y ()2 > 1},


taking to be the outer measure on associated with (132A). Secondly, I will use this notation only for
predicates corresponding to Borel measurable sets; that is to say, I shall write
T
Pr((X1 , . . . , Xn )) = { : in dom Xi , (X1 (), . . . , Xn ())}
only when the set
{(1 , . . . , n ) : (1 , . . . , n )}
n
is a Borel set in R . Part of the reason for this restriction will appear in the next few paragraphs;
Pr((X1 , . . . , Xn )) must be something calculable from knowledge of the joint distribution of X1 , . . . , Xn ,
as defined in 271C. In fact we can safely extend the idea to universally measurable predicates , to be
discussed in Volume 4. But it could happen that gave a measure to a set of the form { : X() A}
for some exceedingly irregular set A, and in such a case it would be prudent to regard this as an accidental
pathology of the probability space, and to treat it in a rather different way.
(I see that I have rather glibly assumed that the formula above defines Pr((X1 , . . . , Xn )) for every Borel
predicate . This is a consequence of 271Bb below.)

271B Theorem Let (, , ) be a probability T space, and X1 , . . . , Xn real-valued random variables on


. Set X () = (X1 (), . . . , Xn ()) for in dom Xi .
(a) There is a unique Radon measure on R n such that
], a] = Pr(Xi i for every i n)
n
Q
whenever a = (1 , . . . , n ) R , writing ], a] for in ], i ];
(b) R n = 1 and E = X X 1 [E] whenever E is defined, where is the completion of ; in particular,
E = Pr((X1 , . . . , Xn ) E) for every Borel set E R n .
T
proof Let be the domain of , and set D = in dom Xi = dom X ; then D is conegligible, so belongs
to . Let D = PD be the subspace measure on D (131B, 214B), and 0 the image measure DX 1
(112E); let T be the domain of 0 .
Write B for the algebra of Borel sets in R n . Then B T. P P For i n, R set Fi = {x : x R n , i
}, Hi = { : dom Xi , Xi () }. Xi is -measurable and its domain is in , so Hi , and
X 1 [Fi ] = D Hi is D -measurable. Thus Fi T. As T is a -algebra of subsets of R n , B T (121J).
Q
Q
Accordingly 0 B is a measure on R n with domain B; of course 0 R n = D = 1. By 256C, the completion
of 0 B is a Radon measure on R n , and R n = 0 R n = 1.
For E B,
E = 0 E = DX 1 [E] = X
X 1 [E] = Pr((X1 , . . . , Xn ) E).
More generally, if E dom , then there are Borel sets E 0 , E 00 such that E 0 E E 00 and (E 00 \ E 0 ) = 0,
so that X 1 [E 0 ] X 1 [E] X 1 [E 00 ] and (X
X 1 [E 00 ] \ X 1 [E 0 ]) = 0. This means that X 1 [E] and
X 1 [E] = X
X X 1 [E 0 ] = E 0 = E.
As for the uniqueness of , if 0 is any Radon measure on Rn such that 0 ], a] = Pr(Xi i i n)
for every a R n , then surely
0 R n = limk 0 ], k1] = limk ], k1] = 1 = R n .
Also I = {], a] : a R n } is closed under finite intersections, and and 0 agree on I. So by the
Monotone Class Theorem (or rather, its corollary 136C), and 0 agree on the -algebra generated by I,
which is B (121J), and are identical (256D).

271C Definition Let (, , ) be a probability space and X1 , . . . , Xn real-valued random variables on .


By the (joint) distribution or law X of the family X =T(X1 , . . . , Xn ) I shall mean the Radon probability
measure of 271B. If we think of X as a function from in dom Xi to R n , then X E = Pr(X X E) for
n
every Borel set E R .
271E Distributions 341

271D Remarks (a) The choice of the Radon probability measure X as the distribution of X , with
the insistence that Radon measures should be complete, is of course somewhat arbitrary. Apart from the
general principle that one should always complete measures, these conventions fit better with some of the
work in Volume 4 and with such results as 272G below.

(b) Observe that in order to speak of the distribution of a family X = (X1 , . . . , Xn ) of random variables,
it is essential that all the Xi should be based on the same probability space.

(c) I see that the language I have chosen allows the Xi to have different domains, T so that the family
(X1 , . . . , Xn ) may not be exactly identifiable with the corresponding function from in dom Xi to R n . I
hope however that using the same symbol X for both will cause no confusion.

(d) It is not useful to think of the whole image measure 0 = DX 1 in the proof of 271B as the
distribution of X , unless it happens to be equal to = X . The distribution of a random variable is
exactly that aspect of it which can be divorced from any consideration of the underlying space (, , ),
and the point of such results as 271K and 272G is that distributions can be calculated from each other,
without going back to the relatively fluid and uncertain model of a random variable in terms of a function
on a probability space.

(e) If X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) are such that Xi =a.e. Yi for each i, then
T T
{ : in dom Xi , Xi () i i n}4{ : in dom Yi , Yi () i i n}
is negligible, so
T
Pr(Xi i i n) = { : in dom Xi , Xi () i i n} = Pr(Yi i i n)
for all 0 , . . . , n R, and X = Y . This means that we can, if we wish, think of a distribution as a measure
u where u = (u0 , . . . , un ) is a finite sequence in L0 (). In the present chapter I shall not emphasize this
approach, but it will always be at the back of my mind.

271E Measurable functions of random variables: Proposition Let X = (X1 , . . . , Xn ) be a fam-


ily of random variables (as always in such a context, I mean them all to be on the same probability space
(, , )); write TX for the domain of X , and let h be a TX -measurable real-valued function defined X -a.e.
on R n . Then we have a random variable Y = h(X1 , . . . , Xn ) defined by setting
h(X1 , . . . , Xn )() = h(X1 (), . . . , Xn ()) for every X 1 [dom h].
The distribution Y of Y is the measure on R defined by the formula
Y F = X h1 [F ]
for just those sets F R such that h1 [F ] TX . Also
R
E(Y ) = h dX
in the sense that if one of these exists in [, ], so does the other, and they are then equal.
proof (a)(i)
S
\ dom Y in ( \ dom Xi ) X 1 [R n \ dom h]
is negligible (using 271Bb), so dom Y is conegligible. If a R, then
E = {x : x dom h, h(x) a} TX ,
so
{ : , Y () a} = X 1 [E] .
As a is arbitrary, Y is -measurable, and is a random variable.
(ii) Let h : R n R be any extension of h to the whole of R n . Then h is TX -measurable, so the
ordinary image measure X h1 , defined on {F : h1 [F ] dom X }, is a Radon probability measure on R
(256G). But for any A R,
342 Probability theory 271E

h1 [A]4h1 [A] R n \ dom h


is X -negligible, so X h1 [F ] = X h1 [F ] if either is defined.
If F R is a Borel set, then
X 1 [h1 [F ]]) = X (h1 [F ]).
Y F = { : Y () F } = (X
So Y and X h1 agree on the Borel sets and are equal (256D).
(b) Now apply Theorem 235E to the measures , X and the function = X . We have
R
X 1 [F ])d = (X
(X X 1 [F ]) = X F
for every F TX , by 271Bb. Because h is X -virtually measurable and defined X -a.e., 235Eb tells us that
R R R
h(XX )d = h(X X )d = h dX
whenever either side is defined in [, ], which is exactly the result we need.

271F Corollary If X is a single random variable with distribution X , then


R
E(X) = x X (dx)
if either is defined in [, ]. Similarly
R
E(X 2 ) =
x2 X (dx)
(whatever X may be). If X, Y are two random variables (on the same probability space!) then we have
R
E(X Y ) = xy (X,Y ) d(x, y)
if either side is defined in [, ].

271G Distribution functions (a) If X is a real-valued random variable, its distribution function
is the function FX : R [0, 1] defined by setting
FX (a) = Pr(X a) = X ], a]
for every a R. (Warning! some authors prefer FX (a) = Pr(X < a).) Observe that FX is non-decreasing,
that lima FX (a) = 0, that lima FX (a) = 1 and that limxa FX (x) = FX (a) for every a R. By
271Ba, X and Y have the same distribution iff FX = FY .

(b) If X1 , . . . , Xn are real-valued random variables on the same probability space, their (joint) distri-
bution function is the function FX : R n [0, 1] defined by writing
FX (a) = Pr(Xi i i n)
whenever a = (1 , . . . , n ) R n . If X and Y have the same distribution function, they have the same
distribution, by the n-dimensional version of 271B.

271H Densities Let X = (X1 , . . . , Xn ) be a family of random variables, all defined on the same
probability space. A density function for (X1 , . . . , Xn ) is a Radon-Nikodym derivative, with respect to
Lebesgue measure, for the distribution X ; that is, a non-negative function f , integrable with respect to
Lebesgue measure L on R n , such that
R
E
f dL = X E = Pr(X X E)
for every Borel set E R n (256J) if there is such a function, of course.

271I Proposition Let X = (X1 , . . . , Xn ) be a family of random variables, all defined on the same
probability space. Write L for Lebesgue measure on R n .
X E) = 0 for every Borel set E such
(a) There is a density function for X iff Pr(X R that L E = 0.
X
(b) A non-negative Lebesgue integrable function f is a density function for X iff ],a] f dL = Pr(X
n
], a]) for every a R .
271J Distributions 343

(c) Suppose that f is a density function for X , and G = {x : f (x) > 0}. Then if h is a Lebesgue
measurable real-valued function defined almost everywhere on G,
R R
E(h(XX )) = h dX = h f dL
if any of the three integrals is defined in [, ], interpreting (h f )(x) as 0 if f (x) = 0 and x
/ dom h.
proof (a) Apply 256J to the Radon probability measure X .
(b) Of course the condition is necessary. If it is satisfied, then (by B.Levis theorem)
R R
f dL = limk ],k11] f dL = limk X ], k11] = 1.
So we have a Radon probability measure defined by writing
R
E = E f dL
whenever E {x : f (x) > 0} is Lebesgue measurable (256E). We are supposing that ], a] = X ], a]
for every a R n ; by 271Ba, as usual, = X , and
R
E
X E)
f dL = E = X E = Pr(X
for every Borel set E R n , and f is a density function for X .
(c) By 256E, X is the indefinite-integral measure over associated with f . So, writing G = {x : f (x) >
0}, we have
R R
h dX = h f dL
whenever either is defined in [,
R ]. But h is TX -measurable and defined X -almost everywhere, where
TX = dom X , so E(h(X X )) = h dX by 271E.

271J The machinery developed in 263 is sufficient to give a very general result on the densities of
X ), as follows.
random variables of the form (X
Theorem Let X = (X1 , . . . , Xn ) be a family of random variables, and D R n a Borel set such that
X D) = 1. Let : D R n be a function which is differentiable relative to its domain everywhere
Pr(X
in D; for x D, let T (x) be a derivative of at x, and set J(x) = | det T (x)|. Suppose that J(x) 6= 0 for
each x D, and that X has a density function f ; and suppose moreover that hDk ikN is a disjoint sequence
of Borel
P sets, with union D, such that Dk is injective for every k. Then (X X ) has a density function
g = k=0 gk where

f (1 (y))
gk (y) = for y [Dk dom f ],
J(1 (y))
= 0 for y R n \ [Dk ].

proof By 262Ia, is continuous, therefore Borel measurable, so (X X ) is a random variable.


For the moment, fix k N and a Borel set F R n . By 263D(iii), [Dk ] is measurable, and by 263D(ii)
[Dk \ dom f ] is negligible. The function gk is such that f (x) = J(x)gk ((x)) for every x Dk dom f , so
by 263D(v) we have
Z Z Z
gk d = gk F d = J(x)gk ((x))F ((x))(dx)
F [Dk ] Dk
Z
= X Dk 1 [F ]).
f d = Pr(X
Dk 1 [F ]
R R R
(The integral [Dk ] gk F is defined because Dk J (gk F ) is defined, and the integral gk F is
defined because [Dk ] is measurable and g is zero off [Dk ].)
Now sum over k. Every gk is non-negative, so by B.Levis theorem g is finite almost everywhere on F ,
and
344 Probability theory 271J

Z Z
X
X
g d = gk d = X Dk 1 [F ])
Pr(X
F k=0 F k=0
1
X
= Pr(X X ) F ).
[F ]) = Pr((X
X ), as claimed.
As F is arbitrary, g is a density function for (X

271K The application of the last theorem to ordinary transformations is sometimes indirect, so I give
an example.
Proposition Let X, Y be two random variables with a joint density function f . Then X Y has a density
function h, where
R 1
h(u) = f ( uv , v)dv
|v|
whenever this is defined in R.

y x
proof Set (x, y) = (xy, y) for x, y R 2 . Then is differentiable, with derivative T (x, y) = , so
0 1
J(x, y) = | det T (x, y)| = |y|. Set D = {(x, y) : y 6= 0}; then D is a conegligible Borel set in R 2 and D is
injective. Now [D] = D and 1 (u, v) = ( uv , v) for v 6= 0. So (X, Y ) = (X Y, Y ) has a density function
g, where
f (u/v,v)
g(u, v) = if v 6= 0.
|v|
To find a density function for X Y , we calculate
R Ra R Ra
Pr(X Y a) = ],a]R g = g(u, v)dv du = h
by Fubinis theorem (252B, 252C). In particular, h is defined and finite almost everywhere; and by 271Ib it
is a density function for X Y .

*271L When a random variable is presented as the limit of a sequence of random variables the following
can be very useful.
Proposition Let hXn inN be a sequence of real-valued random variables converging in measure to a random
variable X (definition: 245A). Writing FXn , FX for the distribution functions of Xn , X respectively,
FX (a) = inf b>a lim inf n FXn (b) = inf b>a lim supn FXn (b)
for every a R.
proof Set = inf b>a lim inf n FXn (b), 0 = inf b>a lim supn FXn (b).
(a) FX (a) . P
P Take any b > a and > 0. Then there is an n0 N such that Pr(|Xn X| b a)
for every n n0 (245F). Now, for n n0 ,
FX (a) = Pr(X a) Pr(Xn b) + Pr(Xn X b a) FXn (b) + .
So FX (a) lim inf n FXn (b)+; as is arbitrary, FX (a) lim inf n FXn (b); as b is arbitrary, FX (a) .
Q
Q
(b) 0 FX (a). P
P Let > 0. Then there is a > 0 such that FX (a + 2) FX (a) + (271Ga). Next,
there is an n0 N such that Pr(|Xn X| ) for every n n0 . In this case, for n n0 ,

FXn (a + ) = Pr(Xn a + ) Pr(X a + 2) + Pr(X Xn )


FX (a + 2) + FX (a) + 2.
Accordingly
0 lim supn FXn (a + ) FX (a) + 2.
As is arbitrary, 0 FX (a).
(c) Since of course 0 , we must have FX (a) = = 0 , as claimed.
271Yd Distributions 345

271X Basic exercises > (a) Let X be a real-valued random variable with finite expectation, and > 0.
1
Show that Pr(|X E(X)| ) Var(X). (This is Chebyshevs inequality.)
2

>(b) Let F : R [0, 1] be a non-decreasing function such that (i) lima F (a) = 0 (ii) lima F (a) = 1
(iii) limxa F (x) = F (a) for every a R. Show that there is a unique Radon probability measure in R
such that F (a) = ], a] for every a R. (Hint: look at 114Xa.) Hence show that F is the distribution
function of some random variable.

> (c) Let X be a real-valued random variable with a density function f . (i) Show that |X| has a density
function g1 where g1 (x) = f (x) + f (x) whenever x 0 and f(x), f (x)are both defined, 0 otherwise.
(ii) Show that X 2 has a density function g2 where g2 (x) = (f ( x) + f ( x))/2 x whenever x > 0 and
this is defined, 0 for other x. (iii) Show that if Pr(X = 0) = 0 then 1/X has adensity function g3 where
g3 (x) = x12 f ( x1 ) whenever this is defined. (iv) Show that if Pr(X < 0) = 0 then X has a density function
g4 where g4 (x) = 2xf (x2 ) if x 0 and f (x2 ) is defined, 0 otherwise.

> (d) Let X and Y be random variables


R with a joint density function f : R 2 R. Show that X + Y has
a density function h where h(u) = f (u v, v)dv for almost every u.

(e) Let X, Y be random variables


R with a joint density function f : R 2 R. Show that X/Y has a
density function h where h(u) = |v|f (uv, v)dv for almost every u.

(f ) Devise an alternative proof of 271K by using Fubinis theorem and one-dimensional substitutions to
show that
RbR 1 u R
a
f ( v , v)dv du = {(u,v):auvb} f
|v|
whenever a b in R.

271Y Further exercises (a) Let T be the topology of R N and B the -algebra of Borel sets (256Ye).
(i) Let I be the family of sets of the form
{x : x R N , x(i) i i n},
where n N and i R for each i n. Show that B is theSsmallest family of subsets of R N such that ()
I B () B \ A B whenever A, B B and A B () kN Ak B for every non-decreasing sequence
hAk ikN in B. (ii) Show that if , 0 are two totally finite measures defined on R N , and F and 0 F are
defined and equal for every F I, then E and 0 E are defined and equal for every E B. (iii) Show that if
is any set and any -algebra of subsets of and X : R N is any function, then X 1 [E] for every
E B iff i X is -measurable for every i N, where i (x) = x(i) for each x R N , i N. (iv) Show that if
X = hXi iiN is a sequence of real-valued random variables on a probability space (, , ), then there is a
B B
unique probability measure X , with domain B, such that X {x : x(i) i i n} = Pr(Xi i i n)
for every 0 , . . . , n R. (v) Under the conditions of (iv), show that there is a unique Radon measure
X on R N (in the sense of 256Ye) such that X {x : x(i) i i n} = Pr(Xi i i n) for every
0 , . . . , n R.

(b) Let F : R 2 [0, 1] be a function. Show that the following are equiveridical: (i) F is the distribu-
tion function of some pair (X1 , X2 ) of random variables (ii) there is a probability measure on R 2 such
that ], a] = F (a) for every a R 2 (iii)() F (1 , 2 ) + F (1 , 2 ) F (1 , 2 ) + F (2 , 1 ) whenever
1 1 and 2 2 () F (1 , 2 ) = lim1 1 ,2 2 F (1 , 2 ) for every 1 , 2 () lim F (, ) =
lim F (, ) = 0 for all () lim F (, ) = 1. (Hint: for non-empty half-open intervals ]a, b], set
]a, b] = F (1 , 2 ) + F (1 , 2 ) F (1 , 2 ) F (2 , 1 ), and continue as in 115B-115F.)

(c) Generalize (b) to higher dimensions, finding a suitable formula to stand in place of that in (iii-) of
(b).

(d) Let (, , ) be a probability space and F a filter on L0 () converging to X0 L0 () for the topology
of convergence in measure. Show that, writing FX for the distribution function of X L0 (),
346 Probability theory 271Yd

FX0 (a) = inf b>a lim inf XF FX (b) = inf b>a lim supXF FX (b)
for every a R.

271 Notes and comments Most of this section seems to have been taken up with technicalities. This
is perhaps unsurprising in view of the fact that it is devoted to the relationship between a vector random
variable X and the associated distribution X , and this necessarily leads us into the minefield which I
attempted to chart in 235. Indeed, I call on results from 235 twice; once in 271E, with a () = X ()
and J() = 1, and once in 271I, with (x) = x and J(x) = f (x).
Distribution functions of one-dimensional random variables are easily characterized (271Xb); in higher
dimensions we have to work harder (271Yb-271Yc). Distributions, rather than distribution functions, can
be described for infinite sequences of random variables (271Ya); indeed, these ideas can be extended to
uncountable families, but this requires proper topological measure theory, and belongs in Volume 4.
The statement of 271J is lengthy, not to say cumbersome. The point is that many of the most important
transformations are not themselves injective, but can easily be dissected into injective fragments (see,
for instance, 271Xc and 263Xd). The point of 271K is that we frequently wish to apply the ideas here to
transformations which are singular, and indeed change the dimension of the random variable. I have not
given the theorems which make such applications routine and suggest rather that you seek out tricks such
as that used in the proof of 271K, which in any case are necessary if you want amenable formulae. Of course
other methods are available (271Xf).

272 Independence
I introduce the concept of independence for families of events, -algebras and random variables. The
first part of the section, down to 272G, amounts to an analysis of the elementary relationships between
the three manifestations of the idea. In 272G I give the fundamental result that the joint distribution of a
(finite) independent family of random variables is just the product of the individual distributions. Further
expressions of the connexion between independence and product measures are in 272J, 272M and 272N. I
give a version of the zero-one law (272O), and I end the section with a group of basic results from probability
theory concerning sums and products of independent random variables (272Q-272U).

272A Definitions Let (, , ) be a probability space.


(a) A family hEi iiI in is (stochastically) independent if
Qn
(Ei1 Ei2 . . . Ein ) = j=1 Eij
whenever i1 , . . . , in are distinct members of I.
(b) A family hi iiI of -subalgebras of is (stochastically) independent if
Qn
(E1 E2 . . . En ) = j=1 Ej
whenever i1 , . . . , in are distinct members of I and Ej ij for every j n.
(c) A family hXi iiI of real-valued random variables on is (stochastically) independent if
Qn
Pr(Xij j for every j n) = j=1 Pr(Xij j )
whenever i1 , . . . , in are distinct members of I and 1 , . . . , n R.

272B Remarks (a) This is perhaps the central contribution of probability theory to measure theory,
and as such deserves the most careful scrutiny. The idea of independence comes from outside mathematics
altogether, in the notion of events which have independent causes. I suppose that 272G and 272M are the
results below which most clearly show the measure-theoretic aspects of the concept. It is not an accident that
both involve product measures; one of the wonders of measure theory is the fact that the same technical
devices are used in establishing the probability theory of stochastic independence and the geometry of
multi-dimensional volume.
272D Independence 347

(b) In the following paragraphs I will try to describe some relationships between the three notions of
independence just defined. But it is worth noting at once the fact that, in all three cases, a family is
independent iff all its finite subfamilies are independent. Consequently any subfamily of an independent
family is independent. Another elementary fact which is immediate from the definitions is that if hi iiI is
an independent family of -algebras, and 0i is a -subalgebra of i for each i, then h0i iiI is an independent
family.
(c) A useful reformulation of 272Ab is the following: A family hi iiI of -subalgebras of is independent
iff
T Q
( iI Ei ) = iI Ei
whenever Ei i for every i and {iQ : Ei 6= } is finite. (Here I follow the convention of 254F, saying
Q that
for a family hi iiI in [0, 1] we take iI i = 1 if I = , and otherwise it is to be inf JI,J is finite iJ j .)
(d) In 272Aa-b I speak of sets Ei and algebras i . In fact (272Ac already gives a hint of this)
we shall more often than not be concerned with rather than with , if there is a difference, where (, , )
is the completion of (, , ).

272C The -subalgebra defined by a random variable To relate 272Ab to 272Ac we need the
following notion. Let (, , ) be a probability space and X a real-valued random variable defined on .
Write B for the -algebra of Borel subsets of R, and X for
{X 1 [F ] : F B} {( \ dom X) X 1 [F ] : F B}.
Then X is a -algebra of subsets of . P
P
= X 1 [] X ;
if F B then
\ X 1 [F ] = ( \ dom X) X 1 [R \ F ] X ,

\ (( \ dom X) X 1 [F ]) = X 1 [R \ F ] X ;
if hFk ikN is any sequence in B then
S S
kN X 1 [Fk ] = X 1 [ kN Fk ],
so
S S
kN X 1 [Fk ], ( \ dom X) kN X 1 [Fk ]
belong to X . Q
Q
Evidently X is the smallest -algebra of subsets of , containing dom X, for which X is measurable.
Also X is a subalgebra of , where is the domain of the completion of (271Aa).
Now we have the following result.

272D Proposition Let (, , ) be a probability space and hXi iiI a family of real-valued random
variables on . For each i I, let i be the -algebra defined by Xi , as in 272C. Then the following are
equiveridical:
(i) hXi iiI is independent;
(ii) whenever i1 , . . . , in are distinct members of I and F1 , . . . , Fn are Borel subsets of R, then
Qn
Pr(Xij Fj for every j n) = j=1 Pr(Xij Fj );
(iii) whenever hFi iiI is a family of Borel subsets of R, and {i : Fi 6= R} is finite, then
T Q
iI (Xi1 [Fi ] ( \ dom Xi )) = iI Pr(Xi Fi ),
where is the completion of ;
(iv) hi iiI is independent.
proof (a)(i)(ii) Write X = (Xi1 , . . . , Xin ). Write X for the joint distribution of X , and for each j n
write j for the distribution of Xij ; let be the product of 1 , . . . , n as described in 254A-254C. (I wrote
348 Probability theory 272D

254 out as for infinite products. If you are interested only in finite products of probability spaces, which
are adequate for our needs in this paragraph, I recommend reading 251-252 with the mental proviso that
all measures are probabilities, and then 254 with the proviso that the set I is finite.) By 256K, is a
Radon measure on R n . (This is an induction on n, relying on 254N for assurance that we can regard as
the repeated product (. . . ((1 2 ) 3 ) . . . n1 ) n .) Then for any a = (1 , . . . , n ) R n , we have

n n
Y Y
], a] = ], j ] = j ], j ]
j=1 j=1
(using 254Fb)
n
Y
= Pr(Xij j ) = Pr(Xij j for every j n)
j=1
(using the condition (i))
= X ], a] .
By the uniqueness assertion in 271Ba, = X . In particular, if F1 , . . . , Fn are Borel subsets of R,
Y Y
Pr(Xij Fj for every j n) = Pr(XX Fj ) = X ( Fj )
jn jn
Y n
Y Yn
= ( Fj ) = j Fj = Pr(Xij Fj ),
jn j=1 j=1

as required.
(b)(ii)(i) is trivial, if we recall that all sets ], ] are Borel sets, so that the definition of independence
given in 272Ac is just a special case of (ii).
(c)(ii)(iv) Assume (ii), and suppose that i1 , . . . , in are distinct members of I and Ej ij for each
j n. For each j, set Ej0 = Ej dom Xij , so that Ej0 may be expressed as Xi1 j
[Fj ] for some Borel set
Fj R. Then (Ej \ Ej0 ) = 0 for each j, so

\ \
( Ej ) = ( Ej0 ) = Pr(Xi1 F1 , . . . , Xin Fn )
1jn 1jn
n
Y
= Pr(Xij Fj )
j=1
(using (ii))
n
Y
= Ej .
i=1

As E1 , . . . , Ek are arbitrary, hi iiI is independent.


(d)(iv)(ii) Now suppose that hi iiI is independent. If i1 , . . . , in are distinct members of I and
F1 , . . . , Fn are Borel sets in R, then Xi1
j
[Fj ] ij for each j, so

\
Pr(Xi1 F1 , . . . , Xin Fn ) = ( Xi1
j
[Fj ])
1jn
n
Y n
Y
= Xi1
j
[Fj ] = Pr(Xij Fj )
i=1 j=1
.

(e) Finally, observe that (iii) is nothing but a re-formulation of (ii), because if Fi = R then Pr(Xi Fi ) = 1
and Xi1 [Fi ] ( \ dom Xi ) = .
272G Independence 349

272E Corollary Let hXi iiI be an independent family of real-valued random variables, and hhi iiI any
family of Borel measurable functions from R to R. Then hhi (Xi )iiI is independent.
proof Writing i for the -algebra defined by Xi , 0i for the -algebra generated by h(Xi ), h(Xi ) is
i -measurable (121Eg) so 0i i for every i and h0i iiI is independent, as in 272Bb.

272F Similarly, we can relate the definition in 272Aa to the others.


Proposition Let (, , ) be a probability space, and hEi iiI a family in . Set i = {, Ei , \ Ei , },
the (-)algebra of subsets of generated by Ei , and Xi = Ei , the characteristic function of Ei . Then the
following are equiveridical:
(i) hEi iiI is independent;
(ii) hi iiI is independent;
(iii) hXi iiI is independent.
proof (i)(iii) If i1 , . . . , in are distinct members of I and 1 , . . . , n R, then for each j n the set
Gj = { : Xij () j } is either Eij or or . If any Gj is empty, then
Qn
Pr(Xij j for everyj n} = 0 = j=1 Pr(Xij j ).
Otherwise, set K = {j : Gj = Eij }; then
\ \
Pr(Xij j for everyj n} = ( Gj ) = ( Eij )
jn jK
Y n
Y
= Eij = Pr(Xij j ).
jK j=1

As i1 , . . . , in and 1 , . . . , n are arbitrary, hXi iiI is independent.


(iii)(ii) follows from (i)(iii) of 272D, because i is the -algebra defined by Xi .
(ii)(i) is trivial, because Ei i for each i.
Remark You will I hope feel that while the theory of product measures might be appropriate to 272D, it
is surely rather heavy machinery to use on what ought to be a simple combinatorial problem like (iii)(ii)
of this proposition. I suggest that you construct an elementary proof, and examine which of the ideas of
the theory of product measures (and the Monotone Class Theorem, 136B) are actually needed here.

272G Distributions of independent random variables I have not tried to describe the joint dis-
tribution of an infinite family of random variables. (Indications of how to deal with a countable family are
offered in 271Ya and 272Yg.) As, however, the independence of a family of random variables is determined
by the behaviour of finite subfamilies, we can approach it through the following proposition.
Theorem Let X = (X1 , . . . , Xn ) be a finite family of real-valued random variables on a probability space.
Let X be the corresponding distribution on R n . Then the following are equiveridical:
(i) X1 , . . . , Xn are independent;
(ii) X can be expressed as a product of n probability measures 1 , . . . , n , one for each factor R of Rn ;
(iii) X is the product measure of X1 , . . . , Xn , writing Xi for the distribution of the random variable
Xi .
proof (a)(i)(iii) In the proof of (i)(iii) of 272D above I showed that X is the product of X1 , . . . , Xn .
(b)(iii)(ii) is trivial.
(c)(ii)(i) Suppose that X is expressible as a product 1 . . . n . Let a = (1 , . . . , n ) R n Then
Q
X ], a]) = X (], a]) = ni=1 i ], i ].
Pr(Xi i i n) = Pr(X
On the other hand, setting Fi = {(1 , . . . , n ) : i i }, we must have
X Fi ) = Pr(Xi i )
i ], i ] = X Fi = Pr(X
350 Probability theory 272G

for each i. So we get


Qn
Pr(Xi i for every i n) = i=1 Pr(Xi i ),
as required.

272H Corollary Suppose that hXi iiI is an independent family of real-valued random variables on a
probability space (, , ), and that for each i I we are given another real-valued random variable Yi on
such that Yi =a.e. Xi . Then hYi iiI is independent.
proof For every distinct i1 , . . . , in I, if we set X = (Xi1 , . . . , Xin ) and Y = (Yi1 , . . . , Yin ), then X =a.e. Y ,
so X , Y are equal (271De). By 272G, Yi1 , . . . , Yin must be independent because Xi1 , . . . , Xin are. As
i1 , . . . , in are arbitrary, the whole family hYi iiI is independent.
Remark It follows that we may speak of independent families in the space L0 () of equivalence classes of
random variables (241C), saying that hXi iiI is independent iff hXi iiI is.

272I Corollary Suppose that X1 , . . . , Xn are independent random variables with densityQfunctions
n
f1 , . . . , fn (271H). Then X = Q (X1 , . . . , Xn ) has a density function f given by setting f (x) = i=1 fi (i )
whenever x = (1 , . . . , n ) in dom(fi ) R n .
proof For n = 2 this is covered by 256L; the general case follows by induction on n.

272J The most important theorems of the subject refer to independent families of random variables,
rather than independent families of -algebras. The value of the concept of independent -algebras lies in
such results as the following.
Proposition Let (, , ) be a complete probability space, and hi iiI a family of -subalgebras of . For
each i I let i be the restriction of to i , and let (I , , ) be the product probability space of the
family h(, i , i )iiI . Define : I by setting ()(i) = for every , i I. Then is inverse-
measure-preserving iff hi iiI is independent.
proof This is virtually a restatement of 254Fb and 254G. T (i) If is inverse-measure-preserving, and
i1 , . . . , in I are distinct and Ej ij for each j, then jn Eij = 1 [{x : x(ij ) Ej for every j n}],
so that
T Q Q
( jn Eij ) = {x : x(ij ) Ej for every j n} = jn ij Eij = jn Eij .
(ii) If hi iiI is independent, and Ei i for every i I and {i : Ei 6= } is finite, then
Q T Q Q
1 [ iI Ei ] = ( iI Ei ) = iI Ei = iI i Ei .
So the conditions of 254G are satisfied and 1 [W ] = W for every W .

272K Proposition Let (, , ) be a probability space and hi iiI an independent family of -


subalgebras of . Let hJ(s)isS be a disjoint family of subsets of I, and for each s S let s be the
S
-algebra of subsets of generated by iJ(s) i . Then hs isS is independent.

proof Let (, , ) be the completion of (, , ). On I let be the product of the measures i , and
let : I be the diagonal map, as in 272J. is inverse-measure-preserving for and , by 272J.
We can identify with the product of hs isS , where for each s S s is the product of hi iiJ(s)
(254N). For s S, let s be the domain of s , and set s (x) = xJ(s) for x I , so that s is inverse-
measure-preserving for and s (254Oa), and s = s is inverse-measure-preserving for and s ; of course
s is the diagonal map from to J(s) . Set s = {1
s [H] : H s }. Then s is a -subalgebra of , and

s s , because
E = 1
s [{x : x(i) E}] s

for every i J(s), E i .


Now suppose that s1 , . . . , sn S are distinct and that Ej sj for each j. Then Ej sj , so there are
Hj sj such that Ej = 1 sj [Hj ] for each j. Set
272M Independence 351

W = {x : x I , xJ(sj ) Hj for every j n}.


Because we can identify with the product of the s , we have
Qn Qn Qn Qn
W = j=1 sj Hj = j=1 (1 sj [Hj ]) = j=1 Ej = j=1 Ej .
T
On the other hand, 1 [W ] = jn Ej , so, because is inverse-measure-preserving,
T T Qn
( jn Ej ) = ( jn Ej ) = W = j=1 Ej .
As E1 , . . . , En are arbitrary, hs isS is independent.

272L I give a typical application of this result as a sample.


Corollary Let X, X1 , . . . , Xn be independent random variables and h : R n R a Borel function. Then X
and h(X1 , . . . , Xn ) are independent.
proof Let X , Xi be the -algebras defined by X, Xi (272C). Then X , X1 , . . . , Xn are independent
(272D). Let be the -algebra generated by X1 . . . Xn . Then 272K (perhaps working in the
completion of the original probability space) tells us that X and are independent. But every Xj is
-measurable so Y = h(X1 , . . . , Xn ) is -measurable (121Kb); also dom Y , so Y and X ,
Y are independent. By 272D again, X and Y are independent, as claimed.
Remark Nearly all of us, when teaching elementary probability theory, would invite our students to treat
this corollary (with an explicit function h, of course) as obvious. In effect, the proof here is a confirmation
that the formal definition of independence offered is a faithful representation of our intuition of independent
events having independent causes.

272M Products of probability spaces and independent families of random variables We have
already seen that the concept of independent random variables is intimately linked with that of product
measure. I now give some further manifestations of the connexion.
Proposition Let h(i , i , i )iiI be a family of probability spaces, and (, , ) their product.
(a) For each i I write i = {i1 [E] : E i }, where i : i is the coordinate map. Then hi iiI
is an independent family of -subalgebras of .
(b) For each i I let hXij ijJ(i) be an independent family of real-valued random variables on i , and
for i I, j J(i) write Xij () = Xij ((i)) for those such that (i) dom Xij . Then hXij iiI,jJ(i)
is an independent family of random variables, and each Xij has the same distribution as the corresponding
Xij .
proof (a) It is easy to check that each i is a -algebra of sets. The rest amounts just to recalling from
254Fb that if J I is finite and Ei i for i J, then
T Q
( iJ i1 [Ei ]) = { : (i) Ei for every i I} = iI i Ei
if we set Ei = Xi for i I \ J.
(b) We know also that (, , ) is the product of the completions (i , i , i ) (254I). From this, we see
that each Xij is defined -a.e., and is -measurable, with the same distribution as Xij . Now apply condition
(iii) of 272D. Suppose that hFij iiI,jJ(i) is a family of Borel sets in R, and that {(i, j) : Fij 6= R} is finite.
Consider
T 1
Ei = jJ(i) (Xij [Fij ] (i \ dom Xij )),
Q T 1
E= iI Ei = iI,jJ(i) (Xij [Fij ] ( \ dom Xij )).
Because each family hXij ijJ(i) is independent, and {j : Fij 6= R} is finite,
Q
i Ei = jJ(i) Pr(Xij Eij )
for each i I. Because
{i : Ei 6= i } {i : j J(i), Fij 6= R}
352 Probability theory 272M

is finite,
Q Q
E = iI i Ei = iI,jJ Pr(Xij Fij );
as hFij iiI,jJ(i) is arbitrary, hXij iiI,jJ(i) is independent.
Remark The formulation in (b) is more complicated than is necessary to express the idea, but is what is
needed for an application below.

272N A special case of 272J is of particular importance in general measure theory, and is most useful
in an adapted form.
Proposition Let (, , ) be a complete probability space, and hEi iiI an independent family in such
that Ei = 21 for every i I. Define : {0, 1}I by setting ()(i) = 1 if Ei , 0 if \ Ei . Then
is inverse-measure-preserving for the usual measure on {0, 1}I (254J).
proof I use 254G again. For each i I let i be the algebra {, Ei , \ Ei , }; then hi iiI is independent
(272F). For i I set i () = ()(i). Let be the usual measure of {0, 1}. Then it is easy to check that
1
1
i [H] = #(H) = H
2
for every H {0, 1}. If hHi iiI is a family of subsets of {0, 1}, and {i : Hi 6= {0, 1}} is finite, then

\ \ Y
1 [ Hi ] = ( 1
i [Hi ]) = 1
i [Hi ]
iI iI iJ
(because 1 [Hi ] i for each i, and hi iiI is independent)
Y Y
= Hi = ( Hi ).
iI iI

As hHi iiI is arbitrary, 254G gives the result.

272O Tail -algebras and the zero-one law I have never been able to make up my mind whether
the following result is deep or not. I think it is one of the many cases in mathematics where a theorem is
surprising and exciting if one comes on it unprepared, but is natural and straightforward if one approaches
it from the appropriate angle.
Proposition Let (, , ) be a probabilitySspace and hn inN an independent T
sequence of -subalgebras of
. Let n be the -algebra generated by mn m for each n, and set = nN n . Then E is either
0 or 1 for every E .
proof For each n, the family (0 , . . . , n , n+1 ) is independent, by 272K. So (0 , . . . , n , ) is indepen-
dent, because n+1 . But this means that every finite subfamily of ( , 0 , 1 , . . . ) is independent,
and therefore that the whole family is (272Bb). Consequently ( , 0 ) must be independent, by 272K
again.
Now if E , then E also belongs to 0 , so we must have
(E E) = E E,
2
that is, E = (E) ; so that E {0, 1}, as claimed.

272P To support the claim that somewhere we have achieved a non-trivial insight, I give a corollary,
which will be fundamental to the understanding of the limit theorems in the next section, and does not seem
to be obvious.
Corollary Let (, , ) be a probability space, and hXn inN an independent sequence of real-valued random
variables on . Then
1
lim supn (X0 + . . . + Xn )
n+1
272R Independence 353

is almost everywhere constant that is, there is some u [, ] such that


1
lim supn (X0 + . . . + Xn ) =a.e. u.
n+1

proof We may suppose that each Xn is -measurable and defined everywhere in , because (as re-
marked in 272H) changing the Xn on a negligible set does not affect their independence, and it affects
1
lim supn (X0
+ . . . + Xn ) only on a negligible set. For each n, let n be the -algebra generated
n+1 S T
by Xn (272C), and n the -algebra generated by mn m ; set = nN n . By 272D, hn inN is
independent, so E {0, 1} for every E (272O).
Now take any a R and set
1
Ea = { : lim supm (X0 () + . . . + Xm ()) a}.
m+1
Then
1 1
lim supm (X0 + . . . + Xm ) = lim supm (Xn + . . . + Xm+n ),
m+1 m+1
so
1
Ea = { : lim supm (Xn () + . . . + Xn+m ()) a}
m+1
belongs to n for every n, because Xi is n -measurable for every i n. So E and
1
Pr(lim supm (X0 + . . . + Xm ) a) = Ea
m+1
must be either 0 or 1. Setting
u = sup{a : a R, Ea = 0}
(allowing sup = and sup R = , as usual in such contexts), we see that
1
lim supn (X0 + . . . + Xn ) = u
n+1
almost everywhere.

272Q I must now catch up on some basic facts from elementary probability theory.
Proposition Let X, Y be independent real-valued random variables with finite expectation (271Ab). Then
E(X Y ) exists and is equal to E(X)E(Y ).
proof Let (X,Y ) be the joint
R distribution of the pair
R (X, Y ). Then (X,Y ) is the product of the distributions
X and Y (272G). Also xX (dx) = E(X) and yY (dy) = E(Y ) exist in R (271F). So
R
xy(X,Y ) d(x, y) exists = E(X)E(Y )
(253D). But this is just E(X Y ), by 271E with h(x, y) = xy.

272R Bienaymes Equality Let X1 , . . . , Xn be independent random variables. Then Var(X1 + . . . +


Xn ) = Var(X1 ) + . . . + Var(Xn ).
proof (a) Suppose first that all the Xi have finite variance. Set ai = E(Xi ), Yi = Xi ai , X = X1 +. . .+Xn ,
Y = Y1 + . . . + Yn ; then E(X) = a1 + . . . + an , so Y = X E(X) and

n
X
Var(X) = E(Y 2 ) = E( Yi )2
i=1
n X
X n n X
X n
= E( Yi Yj ) = E(Yi Yj ).
i=1 j=1 i=1 j=1
354 Probability theory 272R

Now observe that if i 6= j then E(Yi Yj ) = E(Yi )E(Yj ) = 0, because Yi and Yj are independent (by 272E)
and we may use 272Q, while if i = j then
E(Yi Yj ) = E(Yi2 ) = E(Xi E(Xi ))2 = Var(Xi ).
So
Pn Pn
Var(X) = i=1 E(Yi2 ) = i=1 Var(Xi ).

(b)(i) I show next that if Var(X1 + X2 ) < then Var(X1 ) < . P


P We have

Z Z
2
(x + y) X1 (dx)X2 (dy) = (x + y)2 (X1 ,X2 ) (d(x, y))

(by 272G and Fubinis theorem)


= E((X1 + X2 )2 )
(by 271E)
< .
R
So there must be some a R such that (x + a)2 X1 (dx) is finite, that is, E((X1 + a)2 ) < ; consequently
E(X12 ) and Var(X1 ) are finite. Q
Q
(ii) Now an easy induction (relying
Pn on 272L!) shows that if Var(X1 + . . . + Xn ) is finite, so is Var Xj
for every j. Turning this round, if j=1 Var(Xj ) = , then Var(X1 + . . . + Xn ) = , and again the two
are equal.

272S The distribution of a sum of independent random variables: Theorem Let X, Y be in-
dependent real-valued random variables on a probability space (, , ), with distributions X , Y . Then
the distribution of X + Y is the convolution X Y (257A).
proof Set = X Y . Take a R and set h = ], a]. Then h is -integrable, so

Z Z
], a] = h d = h(x + y)(X Y )(d(x, y))

(by 257B, writing X Y for the product measure on R 2 )


Z
= h(x + y)(X,Y ) (d(x, y))

(by 272G, writing (X,Y ) for the joint distribution of (X, Y ); this is where we use the hypothesis that X and
Y are independent)
= E(h(X + Y ))
(applying 271E to the function (x, y) 7 h(x + y))
= Pr(X + Y a).

As a is arbitrary, X Y is the distribution of X + Y .

272T Corollary Suppose that X and Y are independent random variables, and that they have densities
f and g. Then f g is a density function for X + Y .
proof By 257F, f g is a density function for X Y = X+Y .

272U The following simple result will be very useful when we come to stochastic processes in Volume
4, as well as in the next section.
Etemadis lemmaPm(Etemadi 96) Let X0 , . . . , Xn be independent real-valued random variables. For
m n, set Sm = i=0 Xi . Then
272Xh Independence 355

Pr(supmn |Sm | 3) 3 maxmn Pr(|Sm | )


for every > 0.
proof As in 272P, we may suppose that every Xi is a measurable function defined everywhere on a measure
space . Set = maxmn Pr(|Sm | ). For each r n, set
Er = { : |Sm ()| < 3 for every m < r, |Sr ()| 3}.
Then E0 , . . . , En is a disjoint cover of { : maxmn |Sm ()| 3}. Set Er0 = { : Er , |Sn ()| < }.
Then Er0 { : Er , |(Sn Sr )()| > 2}. But Er depends on X0 , . . . , Xr so is independent of
{ : |(Sn Sr )()| > 2}, which can be calculated from Xr+1 , . . . , Xn (272K). So

Er0 { : Er , |(Sn Sr )()| > 2} = Er Pr(|Sn Sr | > 2)


Er (Pr(|Sn | > ) + Pr(|Sr | > )) 2Er ,
and (Er \ Er0 ) (1 2)Er . On the other hand, hEr \ Er0 irn is a disjoint family of sets all included in
{ : |Sn ()| }. So
Pn Pn
{ : |Sn ()| } r=0 (Er \ Er0 ) (1 2) r=0 Er ,
and
Pn
Pr(suprn |Sr | 3) = r=0 Er min(1, ) 3,
12

(considering 31 , 1
3 separately), as required.

272X Basic exercises (a) Let (, , ) be an atomless probability space, and hn inN any sequence
in [0, 1]. Show that there is an independent sequence hEn inN in such that En = n for every n. (Hint:
215D.)

> (b) Let hXi iiI be a family of real-valued random variables. Show that it is independent iff
Qn
E(h1 (Xi1 ) . . . hn (Xin )) = j=1 E(hj (Xij ))
whenever i1 , . . . , in are distinct members of I and h1 , . . . , hn are Borel measurable functions from R to R
such that E(hj (Xij )) are all finite.

(c) Write out a proof of 272F which does not use the theory of product measures.

(d) Let X = (X1 , . . . , Xn ) be a family of random variables all defined on the same probability space, and
suppose that X has a density function f expressible in the form f (1 , . . . , n ) = f1 (1 )f2 (2 ) . . . fn (n ) for
suitable functions f1 , . . . , fn of one real variable. Show that X1 , . . . , Xn are independent.

(e) Let X1 , X2 be independent real-valued random variables both with distribution and distribution
function F . Set Y = max(X1 , X2 ). Show that the distribution of Y is absolutely continuous with respect to
, with Radon-Nikodym derivative F + F , where F (x) = limtx F (t) for every x R. (Hint: use Fubinis
theorem to calculate {(t, u) : t u x} and {(t, u) : t < u x} where is the joint distribution of X1
and X2 .)

(f ) Use 254Sa and the idea of 272J to give another proof of 272O.

(g) Let (, , ) be a probability space


S and hn inN a non-decreasing sequence of -subalgebras of .
Let be the -algebra generated by nN n . Let T be another -subalgebra of such that n and
T are independent for each n. Show that and T are independent. (Hint: apply the Monotone Class
Theorem to {E : (E F ) = E F for every F T}.) Use this to prove 272O.

(h) Let hXn inN be a sequence of random variables and Y a random P variable such that Y and Xn are
independent for each n N. Suppose that Pr(Y N) = 1 and that n=0 Pr(Y n)E(|Xn |) is finite. Set
PY PY ()
Z = n=0 Xn (that is, Z() = n=0 Xn () P whenever dom Y is such that Y () N and dom Xn
for every n Y ()). (i) Show that E(Z) = n=0 Pr(Y n)E(Xn ). (Hint: set Xn0 () = Xn () if Y () n,
0 otherwise.) (ii) Show that if E(Xn ) = for every n N then E(Z) = E(Y ). (This is Walds equation.)
356 Probability theory 272Xi

(i) Let X1 , . . . , Xn be independent random variables. Show that if X1 + . . . + Xn has finite expectation
so does every Xj . (Hint: see part (b) of the proof of 272R.)

>(j) Let X and Y be independent real-valued


R 1 random variables with densities f and g. Show that X Y
has a density function h where h(x) = |y| g(y)f ( xy )dy for almost every x. (Hint: 271K.)

272Y Further exercises (a) Develop a theory of independence for random variables taking values in
R r , following through as many as possible of the ideas of this section.

(b) Show that all the ideas of this section apply equally to complex-valued random variables, subject to
suitable adjustments (to be devised).

(c) Let X0 , . . . , Xn be independent real-valued random variables with distributions 0 , . . . , n and distri-
bution functions F0 , . . . , Fn . Show that, for any Borel set E R,
Pn R Q Q
Pr(supin Xi E) = i=0 E 0j<i Fj (x) i<jn Fj (x)i (dx),
Q Q
where Fj (x) = limtx Fj (t) for each j, and we interpret the empty products 0j<0 Fj (x), n<jn Fj (x)
as 1.

(d) Let (, , ) be a probability space.


(i) Let hEn inN be an independent sequence in . Show that for any real-valued random variable X
with finite expectation,
R
limn En X d En E(X) = 0.
(Hint: let T0 be the subalgebra of generated by {En : n N} and T the -subalgebra of generated by
{En : n N}. Start by considering X = E for E T0 and then X = E for E T. Move from L1 ( T)
to L1 () by using conditional expectations.)
(ii) Let hXn inN be a uniformly integrable independent sequence of random variables on . Show that
for any bounded random variable Y ,
limn E(Xn Y ) E(Xn )E(Y ) = 0.
(iii) Suppose that 1 < p and set q = p/(p 1) (taking q = 1 if p = ). Let hXn inN be
an independent sequence of random variables with supnN kXn kp < , and Y a random variable with
kY kq < . Show that
limn E(Xn Y ) E(Xn )E(Y ) = 0.

(e) Let X1 , . . . , Xn be random variables such that for each j < n the family
(X1 , . . . , Xj , Xj+1 , . . . , Xn )
has the same joint distribution as the original family (X1 , . . . , Xn ). Set Sj = X1 + . . . + Xj for each j n.
(i) Show that for any a 0
Pr(sup1jn |Sj | a) 2 Pr(|Sn | a).
T
(Hint: show that if Ej = { : in dom Xi , |Si ()| < a for i < j, |Sj ()| a} then { :
Ej , |Sn ()| |Sj ()|} 12 Ej .) (ii) Show that E(supjn |Sj |) 2E(|Sn |). (iii) Show that E(supin Si2 )
2E(Sn2 ).

(f ) Let X = hXn inN be an independent sequence of random variables on a complete probability space
(B)
(, , ). Let B be the Borel -algebra of R N (271Ya). Let X be the probability measure with domain B
(B) (B)
defined by setting X E = X X 1 [E] for every E B, and write X for the completion of X . Show that
X is just the product of the distributions Xn .

(g) Let (, , ) be a probability space and hZn inN a sequence of random variables on such that
Pr(Zn N) = 1 for each n, and Pr(Zm = Zn ) = 0 for all m 6= n. Let hXn inN be a sequence of real-valued
273B The strong law of large numbers 357

random variables on , all with the same distribution , and independent of each other and the Zn , in
the sense that if n is theS -algebra defined by Xn , and Tn the -algebra defined by Zn , and T is the
-algebra generated by nN Tn , then (T, 0 , 1 , . . . ) is independent. Set Yn () = XZn () () whenever
this is defined, that is, dom Zn , Zn () N and dom XZn () . Show that hYn inN is an independent
sequence of random variables and that every Yn has the distribution .

272 Notes and comments This section is lengthy for two reasons: I am trying to pack in the basic results
associated with one of the most fertile concepts of mathematics, and it is hard to know where to stop; and I
am trying to do this in language appropriate to abstract measure theory, insisting on a variety of distinctions
which are peripheral to the essential ideas. For while I am prepared to be flexible on the question of whether
the letter X should denote a space or a function, some of the applications of these results which are most
important to me are in contexts where we expect to be exactly clear what the domains of our functions are.
Consequently it is necessary to form an opinion on such matters as what the -algebra defined by a random
variable really is (272C).
Of course I should emphasize again that such proofs as those in 272Q-272R are to be thought of as
confirmations that we have a suitable model of probability theory, rather than as reasons for believing the
results to be valid in statistical contexts. Similarly, 272S-272T can be approached by a variety of intuitions
concerning discrete random variables and random variables with continuous densities, and while the elegant
general results are delightful, they are more important to the pure mathematician than to the statistician.
But I came to an odd obstacle in the proof of 272R, when showing that if X1 + . . . + Xn has finite variance
then so does every Xj . We have done enough measure theory for this to be readily dealt with, but the
connexion with ordinary probabilistic intuition, both here and in 272Xi, remains unclear to me.

273 The strong law of large numbers


I come now to the first of the three main theorems of this chapter. Perhaps I should call it a principle,
rather than a theorem, as I shall not attempt to enunciate any fully general form, but will give three
theorems (273D, 273H, 273I), with a variety of corollaries, each setting out conditions under which the
averages of a sequence of independent random variables will almost surely converge.

273A It will be helpful to start with an explicit statement of a very simple but very useful lemma.
Lemma
P Let hEn inN be a sequence of measurable sets in a measure space (, , ), and suppose that
n=0 En < . Then {n : En } is finite for almost every .

proof We have
\ [ [
{ : {n : En } is infinite} = ( Em ) = inf ( Em )
nN
nN mn mn

X
inf Em = 0.
nN
m=n

273B
Pn Lemma Let hXn inN be an independent sequence of real-valued random variables, and set
Sn = i=0 Xi for each n N.
(a) If hSn inN is convergent in measure, then it P
is convergent almost everywhere.
P

(b) In particular, if E(Xn ) = 0 for every n and n=0 E(Xn2 ) < , then n=0 Xn is defined, and finite,
almost everywhere.
proof (a) Let (, , ) be the underlying probability space. If we change each Xn on a negligible set, we
do not change the independence of hXn inN (272H), and the Sn are also changed only on a negligible set;
so we may suppose from the beginning that every Xn is a measurable function defined on the whole of .
358 Probability theory 273B

Because the functional X 7 E(min(1, |X|)) is one of the pseudometrics defining the topology of conver-
gence in measure (245A), limm,n E(min(1, |Sm Sn |)) = 0, and we can find for each k N an nk N
such that E(min(1, |Sm Snk |)) 4k for every m nk . So Pr(|Sm Snk | 2k ) 2k for every m nk .
By Etemadis lemma (272U) applied to hXi iink ,
Pr(supnk mn |Sm Snk | 3 2k ) 3 2k
for every n nk . Setting
Hkn = { : supnk mn |Sm () Snk ()| 3 2k } for n nk ,
S
Hk = nnk Hkn ,
we have
Hk = limn Hkn 3 2k
P
for each k, so k=0 Hk is finite and almost every belongs to only finitely many of the Hk (273A).
Now take Sany such . Then there is some r N such that / Hk for any k r. In this case, for every
/ nnk Hkn , that is, |Sn () Snk ()| < 3 2k for every n nk . But this means that hSn ()inN
k r,
is a Cauchy sequence, therefore convergent. Since this is true for almost every , hSn inN converges almost
everywhere, as claimed.
P
(b) Now suppose that E(Xn ) = 0 for every n and that n=0 E(Xn2 ) < . In this case, for any m < n,

kSn Sm k21 kk22 kSn Sm k22


(by Cauchys inequality, 244E)
= E(Sn Sm )2 = Var(Sn Sm )
Pn
(because E(Sn Sm ) = i=m+1 E(Xi ) = 0)
n
X
= Var(Xi )
i=m+1

(by Bienaymes equality, 272R)


0

as m . So hSn inN is a Cauchy sequence in L1 () and converges in L1 (), by 242F; by 245G, it


converges in measure in L0 (), 0
Pthat is, hSn inN converges in measure in L (). By (a), hSn inN converges
almost everywhere, that is, i=0 Xi is defined and finite almost everywhere.

Remark The proof above assumes familiarity with the ideas of Chapter 24. However part (b), at least, can
be established without any of these; see 273Xa. In 276B there is a generalization of (b) based on a different
approach.

273C We now need a lemma (part (b) below) from the theory of summability. I take the opportunity
to include an elementary fact which will be useful later in this section and elsewhere.

1 Pn
Lemma (a) If limn xn = x, then limn i=0 xi = x.
n+1
(b) Let hxn inN be summable, and hbn inN a non-decreasing sequence in [0, [ diverging to . Then
1 Pn
limn k=0 bk xk = 0.
bn

proof (a) Let > 0. Let m be such that |xn x| whenever n m. Let m0 m be such that
Pm1
| i=0 x xi | m0 . Then for n m0 we have
273D The strong law of large numbers 359

n
X n
1 1 X
|x xi | = | x xi |
n+1 n+1
i=0 i=0
m1
X n
X
1 1
| x xi | + |x xi |
n+1 n+1
i=0 i=m
m0 (nm+1)
+ 2.
n+1 n+1

1 Pn
As is arbitrary, limn i=0 xi = x.
n+1
Pn
(b) Let > 0. Write sn = i=0 xi for each n, and
P
s = limn sn = i=0 xi ;
set s = supnN |sn | < . Let m N be such that |sn s| whenever n m; then |sn sj | 2
whenever j, n m. Let m0 m be such that bm s bm0 .
Take any n m0 . Then

n
X
| bk xk | = |b0 s0 + b1 (s1 s0 ) + . . . + bn (sn sn1 )|
k=0
= |(b0 b1 )s0 + (b1 b2 )s1 + . . . + (bn1 bn )sn1 + bn sn |
n1
X
= |b0 sn + (bi+1 bi )(sn si )|
i=0
m1
X n1
X
b0 |sn | + (bi+1 bi )|sn si | + (bi+1 bi )|sn si |
i=0 i=m
m1
X n1
X
b0 s + 2s (bi+1 bi ) + 2 (bi+1 bi )
i=0 i=m
= b0 s + 2s (bm b0 ) + 2(bn bm ) 2s bm + 2bn .

Consequently, because bn bm0 ,


1 Pn s bm
| k=0 bk xk | 2 + 2 4.
bn bn
As is arbitrary,
1 Pn
limn k=0 bk xk = 0,
bn
as required.
Remark Part (b) above is sometimes called Kroneckers lemma.

273D The strong law of large numbers: first form Let hXn inN be an independent sequence of
real-valued random variables, and suppose that hbn inN is a non-decreasing sequence in ]0, [, diverging to
P 1
, such that n=0 2 Var(Xn ) < . Then
bn
1 Pn
limn i=0 (Xi E(Xi )) = 0
bn
almost everywhere.
proof As usual, write (, , ) for the underlying probability space. Set
1
Yn = (Xn E(Xn ))
bn
360 Probability theory 273D

for each n; then hYn inN is independent (272E), E(Yn ) = 0 for each n, and
P 2
P 1
n=0 E(Yn ) = n=0 2 Var(Xn ) < .
bn

By 273B, hYn ()inN is summable for almost every . But by 273C,


1 Pn 1 Pn
limn i=0 (Xi () E(Xi )) = limn i=0 bi Yi () = 0
bn bn

for all such . So we have the result.

273E Corollary Let hXn inN be an independent sequence of random variables such that E(Xn ) = 0
for every n and supnN E(Xn2 ) < . Then
1
limn (X0 + . . . + Xn ) = 0
bn
P 1
almost everywhere whenever hbn inN is a non-decreasing sequence of strictly positive numbers and n=0 b2
n
is finite. In particular,
1
limn (X0 + . . . + Xn ) = 0
n+1

almost everywhere.

Remark For most of the rest of this section, we shall take bn = n + 1. The special virtue of 273D is that it
allows other bn , e.g., bn = n ln n. A direct strengthening of this theorem is in 276C below.

273F Corollary Let hEn inN be an independent sequence of measurable sets in a probability space
(, , ). and suppose that
1 Pn
limn i=0 Ei = c.
n+1

Then
1
limn #({i : i n, Ei }) = c
n+1

for almost every .

proof In 273D, set Xn = En , bn = n + 1. For almost every , we have


1 Pn
limn i=0 (Xi () ai ) = 0,
n+1

writing ai = Ei = E(Xi ) for each i. (I see that I am using 272F to support the claim that hXn inN is
independent.) But for any such ,

n
X
1 1
lim #({i : i n, Ei }) ai
n n+1 n+1
i=0
n
1 X
= lim (Xi () ai ) = 0;
n n+1
i=0

1 Pn
because we are supposing that limn i=0 ai = c, we must have
n+1
1
limn #({i : i n, Ei }) = c,
n+1

as required.
273H The strong law of large numbers 361

273G Corollary Let be the usual measure on PN, as described in 254Jb. Then for -almost every
set a N,
1
limn #(a {0, . . . , n}) = 21 .
n+1

proof The sets En = {a : n a} are independent, with measure 21 .


1
Remark The limit limn #(a {0, . . . , n}) is called the asymptotic density of a.
n+1

273H Strong law of large numbers: second form Let hXn inN be an independent sequence of
real-valued random variables, and suppose that supnN E(|Xn |1+ ) < for some > 0. Then
1 Pn
limn i=0 (Xi E(Xi )) = 0
n+1
almost everywhere.
proof As usual, call the underlying probability space (, , ); as in 273B we can adjust the Xn on negligible
sets so as to make them measurable and defined everywhere on , without changing E(Xn ), E(|Xn |) or the
convergence of the partial sums except on a negligible set.
(a) For each n, define a random variable Yn on by setting

Yn () = Xn () if |Xn ()| n,
= 0 if |Xn ()| n.
Then hYn inN is independent (272E). For each n N,
Var(Yn ) E(Yn2 ) E(n1 |Xn |1+ ) n1 K,
where K = supnN E(|Xn |1+ ), so
P 1 P n1
n=0 (n+1)2 Var(Yn ) n=0 (n+1)2 K < .

By 273D,
1 Pn
G = { : limn i=0 (Yi () E(Yi )) = 0}
n+1
is conegligible.
(b) On the other hand, setting
En = { : Yn () 6= Xn ()} = { : |Xn ()| > n},
1+
we have K n En for each n, so
P P 1
n=0 En 1 + K n=1 n1+ < ,

and the set H = { : {n : En } is finite} is conegligible (273A). But of course


1 Pn
limn i=0 (Xi () Yi ()) = 0
n+1
for every H.
(c) Finally,
R R
|E(Yn ) E(Xn )| En
|Xn | En
n |Xn |1+ n K
whenever n 1, so limn E(Yn ) E(Xn ) = 0 and
1 Pn
limn i=0 E(Yi ) E(Xi ) = 0
n+1

(273Ca). Putting these three together, we get


362 Probability theory 273H

1 Pn
limn i=0 Xi () E(Xi ) = 0
n+1
whenever belongs to the conegligible set G H. So
1 Pn
limn i=0 Xi E(Xi ) = 0
n+1
almost everywhere, as required.

273I Strong law of large numbers: third form Let hXn inN be an independent sequence of real-
valued random variables of finite expectation, and suppose that they are identically distributed, that is,
all have the same distribution. Then
1 Pn
limn i=0 (Xi E(Xi )) = 0
n+1
almost everywhere.
proof The proof follows the same line as that of 273H, but some of the inequalities require more delicate
arguments. As usual, call the underlying probability space (, , ) and suppose that the Xn are all
measurable and defined everywhere on . (We need to remember that changing a random variable on a
negligible set does not change its distribution.) Let be the common distribution of the Xn .
(a) For each n, define a random variable Yn on by setting

Yn () = Xn () if |Xn ()| n,
= 0 if |Xn ()| n.
Then hYn inN is independent (272E). For each n N,
R
Var(Yn ) E(Yn2 ) = [n,n]
x2 (dx)
P 1 2
(271E). To estimate n=0 (n+1)2 E(Yn ), set

x2
fn (x) = if |x| n, 0 if |x| > n,
(n+1)2

1 R
so that Var(Yn ) fn d. If r 1 and r < |x| r + 1 then
(n+1)2

X
X 1
fn (x) (r + 1)|x|
(n+1)2
n=0 n=r+1

X 1 1
(r + 1)|x| ( ) |x|,
n n+1
n=r+1

while if |x| 1 then


P P 1 2
n=0 fn (x) n=0 (n+1)2 = 2 < .
6
2
(You do not need to know that the sum is 6 ,
only that it is finite; but see 282Xo.) Consequently
P
f (x) = n=0 fn (x) 2 + |x|
R R
for every x, and f d < , because |x|(dx) is the common value of E(|Xn |), and is finite. By any of
the great convergence theorems,
P 1 P R R
n=0 Var(Yn ) n=0 fn d = f d < .
2 (n+1)
By 273D,
1 Pn
G = { : limn i=0 (Yi () E(Yi )) = 0}
n+1
273J The strong law of large numbers 363

is conegligible.
(b) Next, setting
En = { : Xn () 6= Yn ()} = { : |Xn ()| > n},
we have
S
En = in Fni ,
where
Fni = { : i < |Xn ()| i + 1}.
Now
Fni = {x : i < |x| i + 1}
for every n and i. So

X X
X X
X i
En = Fni = Fni
n=0 n=0 i=n i=0 n=0
X Z
= (i + 1){x : i < |x| i + 1} (1 + |x|)(dx) < .
i=0

Consequently the set H = { : {n : Xn () 6= Yn ()} is finite} is conegligible (273A). But of course


1 Pn
limn i=0 Xi () Yi () = 0
n+1
for every H.
(c) Finally,
R R
|E(Yn ) E(Xn )| En
|Xn | = R\[n,n]
|x|(dx)
whenever n N, so limn E(Yn ) E(Xn ) = 0 and
1 Pn
limn i=0 E(Yi ) E(Xi ) = 0
n+1

(273Ca). Putting these three together, we get


1 Pn
limn i=0 Xi () E(Xi ) = 0
n+1
whenever belongs to the conegligible set G H. So
1 Pn
limn i=0 Xi E(Xi ) = 0
n+1
almost everywhere, as required.
Remarks In my own experience, this is the most important form of the strong law from the point of view
of pure measure theory. I note that 273G above can also be regarded as a consequence of this form.
For a very striking alternative proof, see 275Yn. Yet another proof treats this result as a special case of
the Ergodic Theorem (see 372Xg in Volume 3).

273J Corollary Let (, , ) be a probability space, and f a -integrable real-valued function. Let
be the product measure on N (254A-254C). Then for -almost every = hn inN N ,
1 Pn R
limn i=0 f (i ) = f d.
n+1

proof Define functions Xn on N by setting


) = f (n ) whenever n dom f .
Xn (
364 Probability theory 273J

Then hXn inN is an independent sequence of random variables, all with the same distribution as f (272M).
So
1 Pn R 1 Pn
limn i=0 f (i ) f d = limn ) E(Xi ) = 0
i=0 Xi (
n+1 n+1

for almost every , by 273I, and


1 Pn R
limn i=0 f (i ) = f d.
n+1

for almost every .

Remark I find myself slipping here into measure-theorists terminology; this corollary is one of the basic
applications of the strong law to measure theory. Obviously, in view of 272J and 272M, this corollary is
equivalent to 273I. It could also (in theory) be used as a definition of integration (on a probability space);
it is sometimes called the Monte Carlo method of integration.

273K It is tempting to seek extensions of 273I in which the Xn are not identically distributed, but are
otherwise well-behaved. Any such idea should be tested against the following example. I find that I need
another standard result, complementing that in 273A.

Borel-Cantelli lemma Let (, ,P ) be a probability space and hEn inN an independent sequence of

measurable subsets of such that n=0 En = . Then almost every point of belongs to infinitely
many of the En .

proof (a) Observe first that if 0 , . . . , n [0, 1] then


Qn 1 1 Pn
i=0 (1 i ) max( , 1 2 i=0 i ); 2

this is a simple induction on n, because if


1 Qn 1 Pn
i=0 (1 i ) 1 i=0 i ,
2 2

then

n+1
Y n
X
1
(1 i ) (1 n+1 )(1 i )
2
i=0 i=0
n
X n+1
X
1 1 1
1 i n+1 = 1 i .
2 2 2
i=0 i=0

P Q
Consequently, if hn inN is a sequence in [0, 1] such that n=0 n = , then n=0 (1 n ) = 0. P P For
Pm Q 1 Qn
every n N there is an m n such that i=n+1 i 1, so that i=0 (1 i ) i=0 (1 i ). Letting
2
Q 1 Q
n , i=0 (1 i ) i=0 (1 i ). Q
Q
2
T
(b) Set Fmn = min (\Ei ) for m n. Because Em , . . . , En are independent, so are \Em , . . . , \En
Qn
(272F), and Fmn = i=m (1 Ei ). Letting n ,
T Qn
( im ( \ Ei )) = limn i=m (1 Ei ) = 0
P
by (a), because i=m Ei = . But this means that
S T
{ : {n : En is finite}} = ( mN im ( \ Ei )) = 0,
and almost every point of belongs to infinitely many En , as claimed.
273L The strong law of large numbers 365

273L Now for the promised example.


Example There is an independent sequence hXn inN of random variables such that limn E(|Xn |) = 0
but
1 P
lim supn i=0 Xi E(Xi ) = ,
n+1

1 P
lim inf n i=0 Xi E(Xi ) = 0
n+1
almost everywhere.
proof Let (, , ) be a probability space with an independent sequence hEn inN of measurable sets such
1
that En = for each n. (I have nowhere explained exactly how to build such a sequence.
(n+3) ln(n+3)
Two obvious methods are available to us, and another a trifle less obvious. (i) Take = {0, 1}N and to
1
be the product of the probabilities n on {0, 1}, defined by saying that n {1} = for each n;
(n+3) ln(n+3)
set En = { : (n) = 1}, and appeal to 272M to check that the En are independent. (ii) Build the En
inductively as subsets of [0, 1], arranging that each En should be a finite union of intervals, so that when
you come to choose En+1 the sets E0 , . . . , En define a partition In of [0, 1] into intervals, and you can take
1
En+1 to be the union of (say) the left-hand subintervals of length a proportion of the intervals
(n+3) ln(n+3)
in In . (iii) Use 215D to see that the method of (ii) can be used on any atomless probability space, as in
272Xa.)
Set Xn = (n + 3) ln ln(n + 3)En for each n; then hXn inN is an independent sequence of real-valued
ln ln(n+3)
random variables (272F) and E(Xn ) = for each n, so that E(Xn ) 0 as n . Thus,
ln(n+3)
for instance, {Xn : n N} is uniformly integrable and hXn inN 0 in measure (246Jc); while surely
1 Pn
limn i=0 E(Xi ) = 0.
n+1
On the other hand,

X
X Z
1 1
En = dx
(n+3) ln(n+3) 0
(x+3) ln(x+3)
n=0 n=0
= lim (ln ln(a + 3) ln ln 3) = ,
a

so almost every belongs to infinitely many of the En , by the Borel-Cantelli lemma (273K). Now if we
1 Pn
write Yn = i=0 Xi , then if En we have Xn () = (n + 3) ln ln(n + 3) so
n+1

n+3
Yn () ln ln(n + 3).
n+1
This means that
n n
1 X 1 X
{ : lim sup (Xi () E(Xi )) = } = { : lim sup Xi () = }
n n+1 n n+1
i=0 i=0
= { : sup Yn () = } { : {n : En } is infinite}
nN

is conegligible, and the strong law of large numbers does not apply to hXn inN .
Because
limn kYn k1 = limn E(Yn ) = limn E(Xn ) = 0
(273Ca), hYn inN 0 for the topology of convergence in measure, and hYn inN has a subsequence converging
to 0 almost everywhere (245K). So
1 Pn
lim inf n i=0 (Xi () E(Xi )) = lim inf n Yn () = 0
n+1
366 Probability theory 273L

for almost every . The fact that both lim supn Yn and lim inf n Yn are constant almost everywhere
is of course a consequence of the zero-one law (272P).

*273M All the above has been concerned with pointwise convergence of the averages of independent
random variables, and that is the important part of the work of this section. But it is perhaps worth
complementing it with a brief investigation of norm-convergence. To deal efficiently with convergence in Lp ,
we need the following. (I should perhaps remark that, compared with the general case treated here, the case
p = 2 is trivial; see 273Xj.)
Lemma For any p ]1, [ and > 0, there is a > 0 such that kS + Xkp 1 + kXkp whenever S and X
are independent random variables, kSkp = 1, kXkp and E(X) = 0.
proof (a) Take ]0, 1] such that p 2 and
p2 2
(1 + )p 1 + p +
2
whenever || ; such exists because
(1+)p 1p p(p1) p2
lim0 = < .
2 2 2
Observe that
1
(1 + )p (1 + )p + p + 2p p1

1 1
for every 0. P
P If , this is trivial. If , then

1 p p2
(1 + )p = p (1 + )p p (1 + + )
2 2
p p2 p
p (1 + + ) = p + p p1 (1 + ) p + 2p p1 . Q
Q
2 2
p
Define > 0 by setting (2p + 1) p1 = (this is one of the places where we need to know that p > 1).
2
Let > 0 be such that
p2 1 p
, + (1 + )p p1 .
2 2 2

(b) Now suppose that S and X are independent random variables with kSkp = 1, kXkp and
E(X) = 0. Write (, , ) for the underlying probability space and adjust S and X on negligible sets so
that they are measurable and defined everywhere on . Set = kXkp , = /,
E = { : S() 6= 0}, F = { : |X()| > |S()|}, = kS F kp .
Then

Z Z Z
|S + X|p = |S + X|p + |S + X|p
E\F F
(because S and X are both zero on \ (E F ))
Z
X
= |S|p |1 + |p + k(S F ) + (X F )kpp
E\F
S
Z
X p2
|S|p (1 + p + 2 ) + (kS F kp + kX F )kp )p
E\F
S 2

X
(because | | / 1 everywhere on E \ F )
S
Z Z
p2
(1 + 2 ) |S|p + p |S|p1 sgn S X + ( + )p
2 E\F E\F
(writing sgn() = /|| if 6= 0, 0 if = 0)
273N The strong law of large numbers 367
Z Z
p2 2
= (1 + ) |S|p + p |S|p1 sgn S X + ( + )p
2 \F \F
(because S = 0 on \ E)
Z
p2 2
= (1 + )(1 p ) p
|S|p1 sgn S X + p (1 + )p
2 F

p1
R p1 p1
(because X and |S| sgn S are independent, by 272L, so |S| sgn SX = E(|S| sgn S)E(X) = 0)
Z
p2
(1 + 2 )(1 p ) + p |S|p1 |X|
2 F
1
+ p (1 + )p + ( )p + 2p( )p1

(see (a) above)
Z
p2 2 1
(1 + )(1 p ) + p |X|p
2 F
p1

1
+ p (1 + )p + p + 2pp1

p2 2 p 1 p
1+ +p + p (1 + )p + 2p
2 p1 p1
1
(because = kS F kp kX F kp )

p2 2 1
=1+ + (2p + 1) p1 + p (1 + )p
2 2
p2 1
=1+ + (2p + 1) p1 + p1 (1 + )p
2 2
p2 1 p
1+ + (2p + 1) p1 + p1 (1 + )
2 2
1 + p (1 + kXkp )p .

So kS + Xkp 1 + kXkp , as required.


*Remark What is really happening here is that = k kpp : Lp R is differentiable (as a real-valued function
on the normed space Lp ) and
R
0 (S )(X ) = p |S|p1 sgn S X,
so that in the context here
((S + X) ) = (S ) + 0 (S )(X ) + o(kXkp ) = 1 + o(kXkp )
and kS + Xkp = 1 + o(kXkp ). The calculations above are elaborate partly because they do not appeal to
any non-trivial ideas about normed spaces, and partly because we need the estimates to be uniform in S.

273N Theorem Let hXn inN be an independent sequence of real-valued random variables with zero
1
expectation, and set Yn = n+1 (X0 + . . . + Xn ) for each n N.
(a) If hXn inN is uniformly integrable, then limn kYn k1 = 0.
*(b) If p ]1, [ and supnN kXn kp < , then limn kYn kp = 0.
proof (a) Let > 0. Then there is an M 0 such that E(|Xn | M )+ for every n N. Set
Xn0 = (Xn M X) (M X), n = E(Xn0 ), Xn = Xn0 n , Xn00 = Xn Xn0
for each n N. Then hXn0 inN and hXn inN are independent and uniformly bounded, and kXn00 k1 for
every n. So if we write
1 Pn 1 Pn
Yn = i=0 Xi , Yn00 = 00
i=0 Xi ,
n+1 n+1

hYn inN 0 almost everywhere, by 273E (for instance), while kYn00 k1 for every n. Moreover,
368 Probability theory 273N

|n | = |E(Xn0 Xn )| E(|Xn00 |)
for every n. As |Yn | 2M almost everywhere for each n, limn kYn k1 = 0, by Lebesgues Dominated
Convergence Theorem. So

lim sup kYn k1 = lim sup kYn + Yn00 + n k1


n n

lim kYn k1 + sup kYn00 k1 + sup |n |


n nN nN

2.
As is arbitrary, limn kYn k1 = 0, as claimed.
Pn
*(b) Set M = supnN kXn kp . For n N, set Sn = i=0 Xi . Let > 0. Then there is a > 0 such that
kS + X|kp 1 + kXkp whenever S and X are independent random variables, kSkp = 1, kXkp and
E(X) = 0 (273M). It follows that kS + Xkp kSkp + kXkp whenever S and X are independent random
variables, kSkp is finite, kXkp kSkp and E(X) = 0. In particular, kSn+1 kp kSn kp + M whenever
kSn kp M/. An easy induction shows that
M
kSn kp + M + nM

for every n N. But this means that
1
lim supn kYn kp = lim supn kSn kp M .
n+1
As is arbitrary, limn kYn kp = 0.
Remark There are strengthenings of (a) in 276Xd, and of (b) in 276Ya.

273X Basic exercises (a) In part (b) of the proof of 273B, use Bienaymes equality to show that
limm supnm Pr(|Sn Sm | ) = 0 for every > 0, so that we can apply the argument of part (a) of
the proof directly, without appealing to 242F or 245G or even 244E.
P (1)(n)
(b) Show that n=0 is defined in R for almost every = h(n)inN in {0, 1}N , where {0, 1}N is
n+1
given its usual measure (254J).

> (c) Take any q [0, 1], and give PN a measure such that
{a : I a} = q #(I)
for every I N, as in 254Xg. Show that for -almost every a N,
1
limn #(a {0, . . . , n}) = q.
n+1

> (d) Let be the usual probability measure on PN (254Jb), and for r 1 let r be the product
probability measure on (PN)r . Show that
1
limn #(a1 . . . ar {0, . . . , n}) = 2r ,
n+1

1
limn #((a1 . . . ar ) {0, . . . , n}) = 1 2r
n+1
for r -almost every (a1 , . . . , ar ) (PN)r .

(e) Let be the usual probability measure on PN, and take any infinite b N. Show that limn #(a
b {0, . . . , n})/#(b {0, . . . , n}) = 21 for almost every a N.

>(f ) For each x [0, 1], let k (x) be the kth digit in the decimal expansion of x (choose for yourself what
to do with 0100 . . . = 0099 . . . ). Show that limk k1 #({j : j k, j (x) = 7}) = 10 1
for almost every
x [0, 1].
273 Notes The strong law of large numbers 369

(g) Let hFn inN be a sequence of distribution functions for real-valued random variables, in the sense
of 271Ga, and F another distribution function; suppose that limn Fn (q) = F (q) for every q Q and
limn Fn (a ) = F (a ) whenever F (a ) < F (a), where I write F (a ) for limxa F (x). Show that Fn F
uniformly.

(h) Let (, , ) be a probability space and hXn inN an independent identically distributedTsequence of
real-valued random variables on with common distribution function F . For a R, n N, in dom Xi
set
1
Fn (, a) = #({i : i n, Xi () a}).
n+1
Show that
limn supaR |Fn (, a) F (a)| = 0
for almost every .

(i) FindP
an independent sequence hXn inN of random variables with zero expectation such that kXn k1 = 1
1 n 1
and k n+1 i=0 Xi k1 2 for every n N. (Hint: take Pr(Xn 6= 0) very small.)

(j) Use 272R to prove 273Nb in the case p = 2.

(k)P
Find an independent sequence hXn inN of random variables with zero expectation such that kXn k =
1 n
k n+1 i=0 Xi k = 1 for every n N.

(l) Repeat the work of this section for complex-valued random variables.

(m) Let hEn inN be an independent sequence of measurable sets in a probability space,
Pall with the same
non-zero measure. Let han inN be a sequence of non-negative real numbers such that n=0 an = . Show
P Pkn+1
that n=0 an En = a.e. (Hint: Take a strictly increasing sequence hkn inN such that dn = i=k n +1
ai
ai P 2 P
1 for each n. Set ci = for kn < i kn+1 ; show that n=0 cn < = n=0 cn . Apply 273D with
(n+1)dn
pPn
Xn = cn En and bn = i=0 ci .)

273Y Further exercises (a) Let (, , ) be a probability space, and the product measure on N .
Suppose that f is a real-valued function, defined on a subset of , such that
1 Pn
) = limn
h( i=0 f (i )
n+1
N
exists in R for -almost every = hn inN in . Show (i) that f has conegligible domain (ii) f is
measurable for theRcompletion of (iii) there is an a R such that h = a almost everywhere in N (iv) f
is integrable, with f d = a.

(b) Repeat the work of this section for random variables taking values in R r .

273 Notes and comments I have tried in this section to offer the most useful of the standard criteria
for pointwise convergence of averages of independent random variables. In my view the strong law of large
numbers, like Fubinis theorem, is one of the crucial steps in measure theory, where the subject changes
character. Theorems depending on the strong law have a kind of depth and subtlety to them which is missing
in other parts of the subject. I have described only a handful of applications here, but I hope that 273G,
273J, 273Xc, 273Xf and 273Xh will give an idea of what is to be expected. These do have rather different
weights. Of the four, only 273J requires the full resources of this chapter; the others can be deduced from
the essentially simpler version in 273Xh.
273Xh is the fundamental theorem of statistics or Glivenko-Cantelli theorem. The Fn (., a) are statis-
tics, computed from the Xi ; they are the empirical distributions, and the theorem says that, almost surely,
Fn F uniformly. (I say uniformly to make the result look more striking, but of course the real content
is that Fn (., a) F (a) almost surely for each a; the extra step is just 273Xg.)
370 Probability theory 273 Notes

I have included 273N to show that independence is quite as important in questions of norm-convergence
as it is in questions of pointwise convergence. It does not really rely on any form of the strong law; I quoted
273E as a quick way of disposing of the uniformly bounded parts Xn0 , but of course Bienaymes equality
(272R) is already enough to show that if hXn0 inN is an independent uniformly bounded sequence of random
1
variables with zero expectation, then k n+1 (X0 + . . . + Xn )kp 0 for p = 2, and therefore for every p < .
The proofs of 273H, 273I and 273Na all involve truncation; the expression of a random variable X as the
sum of a bounded random variable and a tail. This is one of the most powerful techniques in the subject,
and will appear again in 274 and 276. In 273Na I used a slightly different formulation of the method,
solely because it matched the definition of uniformly integrable more closely.

274 The central limit theorem


The second of the great theorems to which this chapter is devoted is of a new type. It is a limit theorem,
but the limit involved is a limit of distributions, not of functions (as in the strong limit theorem above or the
martingale theorem below), nor of equivalence classes of functions (as in Chapter 24). I give three forms of
the theorem, in 274I-274K, all drawn as corollaries of Theorem 274G; the proof is spread over 274C-274G.
In 274A-274B and 274M I give the most elementary properties of the normal distribution.

274A The normal distribution We need some facts from basic probability theory.
(a) Recall that
R 2

ex /2
dx = 2
(263G). Consequently, if we set
1 R 2
G E = ex /2
dx
2 E

for every Lebesgue measurable set E, G is a Radon probability measure (256E); we call it the standard
normal distribution. The corresponding distribution function is
1 Ra 2
(a) = G ], a] =
ex /2 dx
2
for a R; for the rest of this section I will reserve the symbol for this function.
Writing for the algebra of Lebesgue measurable subsets of R, (R, , G ) is a probability space. Note
2
that it is complete, and has the same negligible sets as Lebesgue measure, because ex /2 > 0 for every x
(cf. 234Dc).

(b) A random variable X is standard normal if its distribution is G ; that is, if the function x 7
1 2
ex /2 is a density function for X. The point of the remarks in (a) is that there are such random
2
variables; for instance, take the probability space (R, , G ) there, and set X(x) = x for every x R.

(c) If X is a standard normal random variable, then


1 R 2
E(X) =
xex /2 dx = 0,
2

1 R 2 x2 /2
Var(X) = x e dx = 1
2
by 263H.

(d) More generally, a random variable X is normal if there are a R, > 0 such that Z = (X a)/
is standard normal. In this case X = Z + a so E(X) = E(Z) + a = a, Var(X) = 2 Var(Z) = 2 .
We have, for any c R,
274B The central limit theorem 371

Z c Z (ca)/
1 2
/2 2 1 2
e(xa) dx = ey /2
dy
2
2
(substituting x = a + y for < y (c a)/)
ca
= Pr(Z ) = Pr(X c).

1 2
/2 2
So x 7 e(xa) is a density function for X (271Ib). Conversely, of course, a random variable with
2
this density function is normal, with expectation a and variance 2 .

(e) If Z is standard normal, so is Z, because


1 R x2 /2 1 Ra x2 /2
Pr(Z a) = Pr(Z a) = e dx = e dx.
2 a 2
The definition in the first sentence of (d) now makes it obvious that if X is normal, so is a + bX for any
a R, b R \ {0}.

274B Proposition Let X1 , . . . , Xn be independent normal random variables. Then Y = X1 + . . . + Xn


is normal, with E(Y ) = E(X1 ) + . . . + E(Xn ) and Var(Y ) = Var(X1 ) + . . . + Var(Xn ).
proof There are innumerable proofs of this fact; the following one gives me a chance to show off the power
of Chapter 26, but of course (at the price of some disagreeable algebra) 272T also gives the result.
p
(a) Consider first the case n = 2. Setting ai = E(Xp i ), i = Var(Xi ), Zi = (Xi ai )/i we get
independent standard normal variables Z1 , Z2 . Set = 12 + 22 , and express 1 , 2 as cos , sin .
Consider U = cos Z1 + sin Z2 . We know that (Z1 , Z2 ) has a density function
1 2 2
(1 , 2 ) 7 g(1 , 2 ) = e(1 +2 )/2
21 2
(272I). Consequently, for any c R,
R
Pr(U c) = F
g(z)dz,
where F = {(1 , 2 ) : 1 cos + 2 sin c}. But now let T be the matrix

cos sin
.
sin cos
Then it is easy to check that
T 1 [F ] = {(1 , 2 ) : 1 c},

det T = 1,

g(T y) = g(y) for every y R 2 ,


so by 263A
R R R
Pr(U c) = F
g(z)dz = T 1 [F ]
g(T y)dy = ],c]R
g(y)dy = Pr(Z1 c) = (c).
As this is true for every c R, U is also standard normal (I am appealing to 271Ga again). But
X1 + X2 = 1 Z1 + 2 Z2 + a1 + a2 = U + a1 + a2 ,
so X1 + X2 is also normal.
(b) Now we can induce on n. If n = 1 the result is trivial. For the inductive step to n + 1 2, we know
that X1 + . . . + Xn is normal, by the inductive hypothesis, and that Xn+1 is independent of X1 + . . . + Xn ,
by 272L. So X1 + . . . + Xn + Xn+1 is normal, by (a).
The computation of the expectation and variance of X1 + . . . + Xn is immediate from 271Ab and 272R.
372 Probability theory 274C

274C Lemma Let U0 , . . . , Un , V0 , . . . , Vn be independent real-valued random variables and h : R R


a bounded Borel measurable function. Then
Pn Pn Pn
|E h( i=0 Ui ) h( i=0 Vi ) | i=0 suptR |E h(t + Ui ) h(t + Vi ) |.
Pj1 Pn Pn Pn
proof For 0 j n + 1, set Zj = i=0 Ui + i=j Vi , taking Z0 = i=0 Vi and Zn+1 = i=0 Ui , and for
Pj1 Pn
j n set Wj = i=0 Uj + i=j+1 Vj , so that Zj = Wj + Vj and Zj+1 = Wj + Uj and Wj , Uj and Vj are
independent (I am appealing to 272K, as in 272L). Then
n n n
X X X
|E h( Ui ) h( Vi ) | = |E h(Zi+1 ) h(Zi ) |
i=0 i=0 i=0
n
X
|E h(Zi+1 ) h(Zi ) |
i=0
n
X
= |E h(Wi + Ui ) h(Wi + Vi ) |.
i=0

To estimate this sum I turn it into a sum of integrals, as follows. For each i, let Wi be the distribution
of Wi , and so on. Because (w, u) 7 w + u is continuous, therefore Borel measurable, (w, u) 7 h(w, u) is
also Borel measurable; accordingly (w, u, v) 7 h(w + u) h(w + v) is measurable for each of the product
measures Wi Ui Vi on R 3 , and 271E and 272G give us


E h(Wi + Ui )h(Wi + Vi )
Z

= h(w + u) h(w + v)(Wi Ui Vi )d(w, u, v)
Z Z

= h(w + u) h(w + v)(Ui Vi )d(u, v) Wi (dw)
Z Z

h(w + u) h(w + v)(Ui Vi )d(u, v)Wi (dw)
Z

= E h(w + Ui ) h(w + Vi ) Wi (dw)

supE h(t + Ui ) h(t + Vi ) .
tR

So we get
n n n
X X X
|E h( Ui ) h( Vi ) | |E h(Wi + Ui ) h(Wi + Vi ) |
i=0 i=0 i=0
n
X
sup |E h(t + Ui ) h(t + Vi ) |,
i=0 tR

as required.

274D Lemma Let h : R R be a bounded three-times-differentiable function such that M2 =


supxR |h00 (x)|, M3 = supxR |h000 (x)| are both finite. Let > 0.
(a) Let U be a real-valued random variable of zero expectation and finite variance 2 . Then for any t R
we have
2 00 1
|E(h(t + U )) h(t) h (t)| M3 2 + M2 E( (U ))
2 6

where (x) = 0 if |x| , x2 if |x| > .


(b) Let U0 , . . . , Un , V0 , . . . , Vn be independent random variables with finite variances, and suppose that
E(Ui ) = E(Vi ) = 0, Var(Ui ) = Var(Vi ) = i2 for every i n. Then
274E The central limit theorem 373

n n
X X
|E h( Ui ) h( Vi ) |
i=0 i=0
Xn n
X n
X
1
M3 i2 + M2 E (Ui ) + M2 E (Vi ) .
3
i=0 i=0 i=0

proof (a) The point is that, by Taylors theorem with remainder,


1
|h(t + x) h(t) xh0 (t)| M2 x2 ,
2

1 1
|h(t + x) h(t) xh0 (t) x2 h00 (t)| M3 |x|3
2 6
for every x R. So
1 1 1
|h(t + x) h(t) xh0 (t) x2 h00 (t)| min( M3 |x|3 , M2 x2 ) M3 x2 + M2 (x).
2 6 6
Integrating with respect to the distribution of U , we get
1 1
|E h(t + U )) h(t) h00 (t) 2 | = |E(h(t + U )) h(t) h0 (t)E(U ) h00 (t)E(U 2 )|
2 2
1
= |E h(t + U ) h(t) h0 (t)U h00 (t)U 2 |
2
1
E |h(t + U ) h(t) h0 (t)U h00 (t)U 2 |
2
1
E M3 U 2 + M2 (U )
6
1
= M3 2 + M2 E( (U )),
6

as claimed.
(b) By 274C,
n n n
X X X
|E h( Ui ) h( Vi ) | sup |E h(t + Ui ) h(t + Vi ) |
i=0 i=0 i=0 tR
n
X 1
sup |E(h(t + Ui )) h(t) h00 (t)i2 |
2
i=0 tR
1
+ |E(h(t + Vi )) h(t) h00 (t)i2 | ,
2

which by (a) above is at most


Pn 1 2
i=0 3 M3 i + M2 E( (Ui )) + M2 E( (Vi )),

as claimed.

274E Lemma For any > 0, there is a three-times-differentiable function h : R [0, 1], with continuous
third derivative, such that h(x) = 1 for x and h(x) = 0 for x .
proof Let f : ], [ ]0, [ be any twice-differentiable function such that
limx f (n) (x) = limx f (n) (x) = 0
for n = 0, 1 and 2, writing f (n) for the nth derivative of f ; for instance, you could take f (x) = (2 x2 )3 ,
1
or f (x) = exp( ). Now set
2 x2
374 Probability theory 274E
Rx R
h(x) = 1
f/
f
for |x| .

274F Lindebergs theorem Let > 0. Then there is a > 0 such that whenever X0 , . . . , Xn are
independent real-valued random variables such that
E(Xi ) = 0 for every i n,
Pn
i=0 Var(Xi ) = 1,
Pn
i=0 E( (Xi ))

(writing (x) = 0 if |x| , x2 if |x| > ), then


Pn
Pr(
i=0 Xi a) (a)

for every a R.
proof (a) Let h : R [0, 1] be a three-times-differentiable function, with continuous third derivative, such
that ], ] h ], ], as in 274E. Set
M2 = supxR |h00 (x)| = sup|x| |h00 (x)|,

M3 = supxR |h000 (x)| = sup|x| |h000 (x)|;


2
because h000 is continuous, both are finite. Write 0 = (1 ) > 0, and let > 0 be such that
2
1
( M3 + 2M2 ) 0 .
3

Note that limm m (x) = 0 for every x, so if X is a random variable of finite variance we must
have limm E(m (X)) = 0, by Lebesgues Dominated Convergence Theorem; let m 1 be such that
E(m (Z)) , where Z is some (or any) standard normal random variable. Finally, take > 0 such that
, + 2 (/m)2 .
(I hope that you have seen enough - arguments not to be troubled by any expectation of understanding
the reasons for each particular formula here before reading the rest of the argument. But the formula
1
3 M3 + 2M2 , in association with , should recall 274D.)
Pn
(b)
PnLet X0 , . . . , Xn be independent random variables with zero expectation such that i=0 Var(Xi ) = 1
and i=0 E( (Xi )) . We need an auxiliary sequence Z0 , . . . , Zn of standard normal random variables
to match against the Xi . To create this, I use the following device. Suppose that the probability space
underlying X0 , . . . , Xn is (, , ). Set 0 = R n+1 , and let 0 be the product measure on 0 , where
is given the measure and each factor R of R n+1 is given the measure G . Set Xi0 (, z) = Xi ()
and Zi (, z) = i for dom Xi , z = (0 , . . . , n ) R n+1 , i n. Then X00 , . . . , Xn0 , Z0 , . . . , Zn are
independent, and each Xi0 has the same distribution as Xi (272Mb). Consequently S 0 = X00 + . . . + Xn0 has
the same distribution as S = X0 + . . . + Xn (using 272S, or otherwise); so that E(g(S 0 )) = E(g(S)) for any
bounded Borel measurable function g (using 271E). Also each Zi has distribution G , so is standard normal.
p
(c) Write i = Var(Xi ) for each i, and set K = {i : i n, i > 0}. Observe that /i m for each
i K. PP We know that
i2 = Var(Xi ) = E(Xi2 ) E( 2 + (Xi )) = 2 + E( (Xi )) 2 + ,
so

/i / + 2 m
by the choice of . Q
Q
0
(d) Consider the independent normal random variables i Zi . We have E(i Zi ) = E(X Pi ) = 0 and
n
Var(i Zi ) = Var(Xi ) = i for each i, so that Z = Z0 + . . . + Zn has expectation 0 and variance i=0 i2 = 1;
0 2

moreover, by 274B, Z is normal, so in fact it is standard normal. Now we have


274F The central limit theorem 375

n
X X X
E( (i Zi )) = E( (i Zi )) = i2 E(/i (Zi ))
i=0 iK iK
(because 2 / (x) = (x) whenever x R, > 0)
X X
= i2 E(/i (Z)) i2 E(m (Z))
iK iK
(because, by (c), /i m for every i K, so /i (t) m (t) for every t)
X
i2
iK
(by the choice of m)
= .

On the other hand, we surely have


Pn 0
Pn Pn
i=0 E( (Xi )) = i=0 E( (Xi )) i=0 E( (Xi )) .

(e) For any real number t, set


ht (x) = h(x t)
for each x R. Then ht is three-times-differentiable, with supxR |h00t (x)| = M2 and supxR |h000 (x)| = M3 .
Consequently
|E(ht (S)) E(ht (Z))| 0 .
P
P By 274Db,

|E(ht (S)) E(ht (Z))| = |E(ht (S 0 )) E(ht (Z))|


n
X Xn
= |E(ht ( Xi0 )) E(ht ( i Zi ))|
i=0 i=0
n
X n
X n
X
1
M3 i2 + M2 E( (Xi )) + M2 E( (i Zi ))
3
i=0 i=0 i=0
1
M3 + M2 + M2 0 ,
3

by the choice of . Q
Q
(f ) Now take any a R. We have
], a 2] ha ], a] ha+ ], a + ].
Note also that, for any b,
1 R b+2 x2 /2 2
(b + 2) = (b) + e dx (b) + = (b) + 0 .
2 b 2
Consequently

(a) (a 2) 0 = Pr(Z a 2) 0
E(ha (Z)) 0 E(ha (S)) Pr(S a)
E(ha+ (S)) E(ha+ (Z)) + 0 Pr(Z a + 2) + 0
= (a + 2) + 0 (a) + .
But this means just that
Pn
Pr( Xi a) (a) ,
i=0

as claimed.
376 Probability theory 274G

274G Central Limit Theorem Let hXn inN be an independent sequence of random variables, all
pP n
with zero expectation and finite variance; write sn = i=0 Var(Xi ) for each n. Suppose that
1 Pn
limn 2 i=0 E(sn (Xi )) = 0 for every > 0,
sn
2
writing (x) = 0 if |x| , x if |x| > . Set
1
Sn = (X0 + . . . + Xn )
sn
for each n N such that sn > 0. Then
limn Pr(Sn a) = (a)
uniformly for a R.
proof Given > 0, take > 0 as in Lindebergs theorem (274F). Then for all n large enough,
1 Pn
2 i=0 E(sn (Xi )) .
sn
Fix on any such n. Of course we have sn > 0. Set
1
Xi0 = Xi for i n;
sn

then X00 , . . . , Xn0 are independent, with zero expectation,


Pn 0
Pn 1
i=0 Var(Xi ) = i=0 2 Var(Xi ) = 1,
sn

Pn Pn 1
i=0 E( (Xi0 )) = i=0 s2 E(sn (Xi )) .
n

By 274F,

Pr(Sn a) (a) = Pr(Pn X 0 a) (a)
i=0 i

for every a R. Since this is true for all n large enough, we have the result.

274H Remarks (a) The condition


1 Pn
limn i=0 E(sn (Xi )) = 0 for every > 0
s2n
is called Lindebergs condition, following Lindeberg 22.

(b) Lindebergs condition is necessary as well as sufficient, in the following sense. Suppose that hXn inN
is an independent sequence of real-valued random variables with zero expectation and finite variance; write
p pPn n
n = Var(Xn ), sn = i=0 Var(Xi ) for each n. Suppose that limn sn = , limn = 0 and that
sn
1
limn Pr(Sn a) = (a) for each a R, where Sn = (X0 + . . . + Xn ). Then
sn
1 Pn
limn i=0 E(sn (Xi )) = 0
s2n

for every > 0. (Feller 66, XV.6, Theorem 3; Loeve 77, 21.2.)

(c) The proof of 274F-274G here is adapted from Feller 66, VIII.4. It has the virtue of being ele-
mentary, in that it does not involve characteristic functions. Of course this has to be paid for by a number
of detailed estimations; and what is much more serious it leaves us without one of the most powerful
techniques for describing distributions. The proof does offer a method of bounding
| Pr(Sn a) (a)|;
but it should be said that the bounds obtained are not useful ones, being grossly over-pessimistic, at least
in the readily analysable cases. (For instance, a better bound, in many cases, is given by the Berry-Esseen
274J The central limit theorem 377

theorem: pif hXn inN is independent and identically distributed, with zero expectation, and the common
values of E(Xn2 ), E(|Xn |3 ) are , < , then
33
| Pr(Sn a) (a)| ;
4 3 n+1

see Feller 66, XVI.5, Loeve 77, 21.3, or Hall 82.) Furthermore, when |a| is large, (a) is exceedingly
close to either 0 or 1, so that any uniform bound for | Pr(S a) (a)| gives very little information; a great
deal of work has been done on estimating the tails of such distributions more precisely, subject to special
conditions. For instance, if X0 , . . . , Xn are independent random variables, ofpzero expectation, uniformly
bounded with |Xi | K almost everywhere for each i, Y = X1 + . . . + Xn , s = Var(Y ) > 0, S = 1s Y , then
for any [0, s/K]

2 2
Pr(|S| ) 2 exp K 2
l 2e /2
2(1 + 2s )

if s K (Renyi 70, VII.4, Theorem 1).


I now list some of the standard cases in which Lindebergs conditions are satisfied, so that we may apply
the theorem.

274I Corollary Let hXn inN be an independent sequence of real-valued random variables, all with the
same distribution, and suppose that their common
p expectation is 0 and their common variance is finite and
not zero. Write for the common value of Var(Xn ), and set
1
Sn = (X0 + . . . + Xn )
n+1
for each n N. Then
limn Pr(Sn a) = (a)
uniformly for a R.

proof In the language of 274H, we have n = , sn = n, so the first two conditions are surely satisfied;
moreover, if is the common distribution of the Xn , then
R
E(sn (Xn )) = {x:|x|>n} x2 (dx) 0
by Lebesgues Dominated Convergence Theorem; so that
1 Pn
2 i=0 E(sn (Xn )) 0
sn
by 273Ca. Thus Lindebergs conditions are satisfied and 274G gives the result.

274J Corollary Let hXn inN be an independent sequence of real-valued random variables with zero
expectation, and suppose that {Xn2 : n N} is uniformly integrable (246A) and that
1 Pn
lim inf n i=0 Var(Xi ) > 0.
n+1
Set
pPn 1
sn = i=0 Var(Xi ), Sn = (X0 + . . . + Xn )
sn
for large n N. Then
limn Pr(Sn a) = (a)
uniformly for a R.
proof The condition
1 Pn
lim inf n i=0 Var(Xi ) > 0
n+1
378 Probability theory 274J


means that there are c > 0, n0 N such that sn c n + 1 for every n n0 . Let the underlying space be
(, , ), and , > 0. Writing (x) = 0 for |x| , x2 for |x| > , as in 274F-274G, we have
R
E(sn (Xi )) E(cn+1 (Xi )) = F (i,cn+1) Xi2 d
2
for n n0 , i n, where F (i, ) = {
R : 2dom Xi , |X i ()| > }. Because {Xi : i N} is uniformly
2
integrable, there is a 0 such that F (i,) Xi d c for every i N (246I). Let n1 n0 be such that

c n1 + 1 ; then for any n n1
1 Pn 1 Pn 2
sn
2 i=0 E(sn (Xi )) 2 c (n+1)i=0 c = .

As , are arbitrary, the conditions of 274G are satisfied and the result follows.

274K Corollary Let hXn inN be an independent sequence of real-valued random variables with zero
expectation, and suppose that
(i) there is some > 0 such that supnN E(|Xn |2+ ) < ,
1 Pn
(ii) lim inf n i=0 Var(Xi ) > 0.
n+1
pPn
Set sn = i=0 Var(Xi ) and
1
Sn = (X0 + . . . + Xn )
sn
for large n N. Then
limn Pr(Sn a) = (a)
uniformly for a R.
proof The point is that {Xn2 : n N} is uniformly integrable. PP Set K = 1 + supnN E(|Xn |2+ ). Given
> 0, set M = (K/)1/ . Then (Xn2 M )+ M |Xn |2+ , so
E(Xn2 M )+ KM =
for every n N. As is arbitrary, {Xn2 : n N} is uniformly integrable. Q
Q
Accordingly the conditions of 274J are satisfied and we have the result.

274L Remarks (a) All the theorems of this section are devoted to finding conditions under which a
random variable S is nearly standard normal, in the sense that Pr(S a) l Pr(Z a) uniformly for
a R, where Z is some (or any) standard normal random variable. In all cases the random variable S is
normalized to have expectation 0 and variance 1, and is a sum of a large number of independent random
variables. (In 274G and 274I-274K it is explicit that there must be many Xi , since they refer to a limit as
n . This is not saidin so many words in the formulation I give of Lindebergs theorem, but the proof
makes it evident that n + 2 1, so surely n will have to be large there also.)

(b) I cannot leave this section without remarking that the form of the definition of nearly standard
normal may lead your intuition astray if you try to apply it to other distributions. If we take F to be the
distribution function of S, so that F (a) = Pr(S a), I am saying that S is nearly standard normal if
supaR |F (a) (a)| is small. It is natural to think of this as approximation in a metric, writing
(, 0 ) = supaR |F (a) F 0 (a)|
for distributions , 0 on R, where F (a) = ], a]. In this form, the theorems above can be read as
finding conditions under which limn (Sn , G ) = 0. But the point is that is not really the right metric
to use. It works here because G is atomless. But suppose, for instance, that is the distribution which gives
mass 1 to the point 0 (I mean, that E = 1 if 0 E R, 0 if 0 / E R), and that n is the distribution
of a normal random variable with expectation 0 and variance n1 , for each n 1. Then F (0) = 1 and
Fn (0) = 12 , so (n , ) = 12 for each n 1. However, for most purposes one would regard the difference
between n and as small, and surely is the only distribution which one could reasonably call a limit of
the n .
274Xd The central limit theorem 379

(c) The difficulties here present themselves in more than one form. A statistician would be unhappy
with the idea that the n of the last paragraph were far from (and from each other), on the grounds that
any measurement involving random variables with these distributions must be subject to error, and small
errors of measurement will render them indistinguishable. A pure mathematician, looking forward to the
possibility of generalizing these results, will be unhappy with the emphasis given to the values of ], a],
for which it may be difficult to find suitable equivalents in more abstract spaces.

(d) These considerations join together to lead us to a rather different definition for a topology on the
space P of probability distributions on R. For any bounded continuous function h : R R we have a
pseudometric h : P P [0, [ defined by writing
R R
h (, 0 ) = | h d h d 0 |
for all , 0 P . The vague topology on P is that generated by the pseudometrics h (2A3F). I will not
go into its properties in detail here (some are sketched in 274Ya-274Yd below; see also 285K-285L, 285S and
437J-437O in Volume 4). But I maintain that the right way to look at the results of this chapter is to say
that (i) the distributions S are close to G for the vague topology (ii) the sets { : (, G ) < } are open
for that topology, and that is why (S , G ) is small.

*274M I include a simple pair of inequalities which are frequently useful when studying normal random
variables.
R 2 1 2
Lemma (a) x et /2 dt ex /2 for every x > 0.
x
R 2 1 2
(b) x et /2 dt ex /2 for every x 1.
2x

proof (a)
Z Z Z
2 2 2 1 2
et /2
dt = e(x+s) /2
ds ex /2
exs ds = ex /2
.
x 0 0
x

(b) Set
2 2
f (t) = et /2
(1 x(t x))ex /2
.
2
0 00 2 t /2
Then f (x) = f (x) = 0 and f (t) = (t 1)e is positive for t x (because x 1). Accordingly f (t) 0
R x+1/x
for every t x, and x f (t)dt 0. But this means just that
Z Z 1
x+ x Z 1
x+ x
2 2 2 1 x2 /2
et /2
dt et /2
(1 x(t x))ex /2
= e ,
x x x
2x

as required.

274X Basic exercises > (a) Use 272T to give an alternative proof of 274B.

(b) Prove 274D when h00 is M3 -Lipschitz but not necessarily differentiable.

(c) Let hmk ikN be a strictly increasing sequence in N such that m0 = 0 and limk mk /mk+1 = 0. Let

hXn inN be an independent sequence of random variables such that Pr(Xn = mk ) = Pr(Xn = mk ) =
1/2mk , Pr(Xn = 0) = 1 1/mk whenever mk1 n < mk . Show that the Central Limit Theorem is not

valid for hXn inN . (Hint: setting Wk = (X0 + . . . + Xmk 1 )/ mk , show that Pr(Wk G) 1 for every
open set G including Z.)

(d) Let hXn inN be any independent sequence of random variables all with the same distribution; suppose
1
that they all have finite variance 2 > 0, and that their common expectation is c. Set Sn = (X0 +
n+1
. . . + Xn ) for each n, and let Y be a normal random variable with expectation c and variance 2 . Show that
limn Pr(Sn a) = Pr(Y a) uniformly for a R.
380 Probability theory 274Xe

> (e) Show that for any a R,



n
bn
2 +a c
1 X2 n! 1 n n
lim = lim n #({I : I n, #(I) + a }) = (a).
n 2n r!(n r)! n 2 2 2
r=0

(f ) Show that 274I is a special case of 274J.

(g)p
Let hXn inN be an independent sequence of real-valued random variables with zero expectation. Set
Pn
sn = i=0 Var(Xi ) and
1
Sn = (X0 + . . . + Xn )
sn
for each n N. Suppose that there is some > 0 such that
1 Pn
limn 2+ i=0 E(|Xi |2+ ) = 0.
sn
Show that limn Pr(Sn a) = (a) uniformly for a R. (This is a form of Liapounoff s central limit
theorem; see Liapounoff 1901.)

(h) Let P be the set of Radon probability measures on R. Let 0 P , a R. Show that the map
7 ], a] : P [0, 1] is continuous at 0 for the vague topology on P iff 0 {a} = 0.

274Y Further exercises (a) Write P for the set of Radon probability measures on R. For , 0 P
set

(, 0 ) = inf{ : 0, ], a ] 0 ], a] ], a + ] +
for every a R}.
Show that is a metric on P and that it defines the vague topology on P . ( is called Levys metric.)

(b) Write P for the set of Radon probability measures on R, and let be the metric on P defined in
274Lb. Show that if P is atomless and > 0, then { 0 : 0 P, ( 0 , ) < } is open for the vague
topology on R.

(c) Let hSn inN be a sequence of real-valued random variables, and Z a standard normal random variable.
Show that the following are equiveridical:
(i) G = limn Sn for the vague topology, writing Sn for the distribution of Sn ;
(ii) E(h(Z)) = limn E(h(Sn )) for every bounded continuous function h : R R;
(iii) E(h(Z)) = limn E(h(Sn )) for every bounded function h : R R such that () h has continuous
derivatives of all orders () {x : h(x) 6= 0} is bounded;
(iv) limn Pr(Sn a) = (a) for every a R;
(v) limn Pr(Sn a) = (a) uniformly for a R;
(vi) {a : limn Pr(Sn a) = (a)} is dense in R.
(See also 285L.)

(d) Let (, , ) be a probability space, and P the set of Radon probability measures on R. Show that
X 7 X : L0 () P is continuous for the topology of convergence in measure on L0 () and the vague
topology on P .

274 Notes and comments For more than two hundred years the Central Limit Theorem has been one
of the glories of mathematics, and no branch of mathematics or science would be the same without it. I
suppose it is the most important single theorem of probability theory; and I observe that the proof hardly
uses measure theory. To be sure, I have clothed the arguments above in the language of measure and
integration. But if you look at their essence, the vital elements of the proof are
(i) a linear combination of independent normal random variables is normal (274Ae, 274B);
275B Martingales 381

(ii) if U , V , W are independent random variables, and h is a bounded continuous function,


then |E(h(U, V, W ))| suptR |E(h(U, V, t))| (274C);
(iii) if (X0 , . . . , Xn ) are independent random variables, then we can find independent ran-
dom variables (X00 , . . . , Xn0 , Z0 , . . . , Zn ) such that Zj is standard normal and Xj0 has the same
distribution as Xj , for each j (274F).
The rest of the argument consists of elementary calculus, careful estimations and a few of the most fun-
damental properties of expectations and independence. Now (ii) and (iii) are justified above by appeals
to Fubinis theorem, but surely they belong to the list of probabilistic intuitions which take priority over
the identification of probabilities with countably additive functionals. If they had given any insuperable
difficulty it would have been a telling argument against the model of probability we are using, but would
not have affected the Central Limit Theorem. In fact (i) seems to be the place where we really need a
mathematical model of the concept of distribution, and all the relevant calculations can be done in terms
of the Riemann integral on the plane, with no mention of countable additivity. So while I am happy and
proud to have written out a version of these beautiful ideas, I have to admit that they are in no essential
way dependent on the rest of this treatise.
In 285 I will describe a quite different approach to the theorem, using much more sophisticated machinery;
but it will again be the case, perhaps more thoroughly hidden, that the relevance of measure theory will
not be to the theorem itself, but to our imagination of what an arbitrary distribution is. For here I do
have a claim to make for my subject. The characterization of distribution functions as arbitrary monotonic
functions, continuous on the right, and with the right limits at (271Xb), together with the analysis of
monotonic functions in 226, gives us a chance of forming a mental picture of the proper class of objects to
which such results as the Central Limit Theorem can be applied.
Theorem 274F is a trifling modification of Theorem 3 of Lindeberg 22. Like the original, it emphasizes
what I believe to be vital to all the limit theorems of this chapter: they are best founded on a proper
understanding of finite sequences of random variables. Lindebergs condition was the culmination of a long
search for the most general conditions under which the Central Limit Theorem would be valid. I offer
a version of Laplaces theorem (274Xe) as the starting place, and Liapounoffs condition (274Xg) as an
example of one of the intermediate stages. Naturally the corollaries 274I, 274J, 274K and 274Xd are those
one seeks to apply by choice. There is an intriguing, but as far as I know purely coincidental, parallel between
273H/274K and 273I/274Xd. As an example of an independent sequence hXn inN of random variables, all
with expectation zero and variance 1, to which the Central Limit Theorem does not apply, I offer 274Xc.

275 Martingales
This chapter so far has been dominated by independent sequences of random variables. I now turn to
another of the remarkable concepts to which probabilistic intuitions have led us. Here we study evolving
systems, in which we gain progressively more information as time progresses. I give the basic theorems
on pointwise convergence of martingales (275F-275H, 275K) and a very brief account of stopping times
(275L-275P).

275A Definition Let (, , ) be a probability space with completion (, , ), and hn inN a non-
decreasing sequence of -subalgebras of . A martingale adapted to hn inN is a sequence hXn inN of
integrable real-valued random variables on such Rthat (i) dom
R Xn n and Xn is n -measurable for each
n N (ii) whenever m n R N and E
R m then E
X n = E
Xm .
Note that it is enough if E Xn+1 = E Xn whenever n N, E n .

275B Examples We have seen many contexts in which such sequences appear naturally; here are a
few.
(a) Let (, , ) be a probability space and hn inN a non-decreasing sequence of -subalgebras of .
Let X be any real-valued random variable on with finite expectation, and for each n N let Xn be a
conditional expectation of X on n , as in 233. Subject to the conditions that dom Xn n and Xn is
382 Probability theory 275B

actually n -measurable for each


R n (a purely
R technical
R point see 232He), hXn inN will be a martingale
adapted to hn inN , because E Xn+1 = E X = E Xn whenever E n .

(b) Let (, , ) be a probability space and hXn inN an independent sequence of random variables all
S
with zero expectation. For each n N let n be the -algebra generated by in Xi , writing Xi for the
-algebra defined by Xi (272C), and set Sn = X0 + . . . + XnR. Then hSnRinN is a martingale adapted to n .
(Use 272K to see that Xn+1 is independent of n , so that E Xn+1 = Xn+1 E = 0 for every E n ,
by 272Q.)

(c) Let (, , ) be a probability space and hXn inN an independent sequence of random variables all
S
with expectation 1. For each n N let n be the -algebra generated by in Xi , writing Xi for the
-algebra defined by Xi , and set Wn = X0 . . . Xn . Then hWn inN is a martingale adapted to hn inN .

275C Remarks (a) It seems appropriate to the concept of a random variable X being adapted to a
-algebra to require that dom X and that X should be -measurable, even though this may mean
that other random variables, equal almost everywhere to X, may fail to be adapted to .

(b) Technical problems of this kind evaporate, of course, if all -negligible subsets of X belong to 0 .
But examples such as 275Bb make it seem unreasonable to insist on such a simplification as a general rule.

(c) The concept of martingale can readily be extended to other index sets than N; indeed, if I is any
partially ordered set, we can say that hXi iiI is a martingale on (, , ) adapted to hi iiI if (i) each i
is a -subalgebra of (ii) each Xi is an integrable real-valued
R i -measurable
R random variable such that
dom Xi i (iii) whenever i j in I, then i j and E Xi = E Xj for every E i . The principal
case, after I = N, is I = [0, [; I = Z is also interesting, and I think it is fair to say that the most important
ideas can already be expressed in theorems about martingales indexed by finite sets I. But in this volume
I will generally take martingales to be indexed by N.

(d) Given just a sequence hXn inN of integrable real-valued random variables on a probability space
(, , ), we can say simply that hXn inN is a martingale on (, , ) if there is some non-decreasing
sequence hn inN of -subalgebras of (the completion of ) such that hXn inN is a martingale adapted
S
to hn inN . If we write n for the -algebra generated by in Xi , where Xi is the -algebra defined by
Xi , as in 275Bb, then it is easy to see that hXn inN is a martingale iff it is a martingale adapted to hn inN .

(e) Continuing from (d), it is also easy to see that if hXn inN is a martingale on (, , ), and Xn0 =a.e. Xn
for every n, then hXn0 inN is a martingale on (, , ). (The point is that if hXn inN is adapted to hn inN ,
then both hXn inN and hXn0 inN are adapted to hn inN , where
n = {E4F : E n , F is negligible}.)
Consequently we have a concept of martingale as a sequence in L1 (), saying that a sequence hXn inN in
L1 () is a martingale iff hXn inN is a martingale.
Nevertheless, I think that the concept of martingale adapted to a sequence of -algebras is the primary
one, since in all the principal applications the -algebras reflect some essential aspect of the problem, which
may not be fully encompassed by the random variables alone.

(f ) The word martingale originally (in English; the history in French is more complex) referred to a strap
used to prevent a horse from throwing its head back. Later it was used as the name of a gambling system
in which the gambler doubles his stake each time he loses, and (in French) as a general term for gambling
systems. These may be regarded as a class of stopped-time martingales, as described in 275L-275P below.

275D A large part of the theory of martingales consists of inequalities of various kinds. I give two of
the most important, both due to J.L.Doob. (See also 276Xa-276Xb.)
Lemma Let (, , ) be a probability space, and hXn inN a martingale on . Fix n N and set X =
max(X0 , . . . , Xn ). Then for any > 0,
275F Martingales 383

1
Pr(X ) E(Xn+ ),

writing Xn+ = max(0, Xn ).


proof Write for the completion of , and for its domain. Let hn inN be a non-decreasing sequence of
-subalgebras of to which hXn inN is adapted. For each i n set
Ei = { : dom Xi , Xi () },
S
Fi = Ei \ j<i Ej .
S S
Then F0 , . . . , Fn is disjoint and F = in Fi = in Ei ; moreover, writing H for the conegligible set
T
in dom Xi ,

{ : X () } = F H,
so that
Pn
Pr(X ) = { : X () } = F = i=0 Fi .
On the other hand, Ei and Fi belong to i for each i n, so
R R
Fi
Xn = Fi Xi Fi
for every i, and
Pn Pn R R R
F = i=0 Fi i=0 Fi Xn = F
Xn F
Xn+ E(Xn+ ),
as required.
R
Remark Note that in fact we have F F
Xn , where F = { : X () }; this is of great importance
in many applications.

275E Up-crossings The next lemma depends on the notion of up-crossing. Let x0 , . . . , xn be any list
of real numbers, and a < b in R. The number of up-crossings from a to b in the list x0 , . . . , xn is the
number of pairs (j, k) such that 0 j < k n, xj a, xk b and a < xi < b for j < i < k. Note that this
is also the largest m such that sm < , if we write
r1 = inf{i : i n, xi a},

s1 = inf{i : r1 < i n, xi b},

r2 = inf{i : s1 < i n, xi a},

s2 = inf{i : r2 < i n, xi b}
and so on, taking inf = .

275F Lemma Let (, , ) be T a probability space and hXn inN a martingale on . Suppose that n N
and that a < b in R. For each in dom Xi , let U () be the number of up-crossings from a to b in the
list X0 (), . . . , Xn (). Then
1
E(U ) E((Xn X0 )+ ),
ba

writing (Xn X0 )+ () = max(0, Xn () X0 ()) for dom Xn dom X0 .


proof Each individual step in the proof is elementary, but the structure as a whole is non-trivial.
(a) The following fact will be useful. Suppose that x0 , . . . , xn are real numbers; let u be the number of
up-crossings from a to b in the list x0 , . . . , xn . Set yi = max(xi , a) for each i; then u is also the number of
up-crossings from a to b in the list y0 , . . . , yn . For each k n, set ck = 1 if there is a j k such that xj a
and xi < b for j i k, 0 otherwise. Then
Pn1
(b a)u k=0 ck (yk+1 yk ).
384 Probability theory 275F

P
P I induce on m to show that (defining rm , sm as in 275E)
Psm 1
(b a)m k=0 ck (yk+1 yk )
whenever m u. For m = 0 (taking s0 = 1) we have 0 = 0. For the inductive step to m 1, we have
sm1 < rm < sm n (because I am supposing that m u), and ck = 0 if sm1 k < rm , ck = 1 if
rm k < sm . So

sX
m 1 sm1 1 sX
m 1
X
ck (yk+1 yk ) = ck (yk+1 yk ) + (yk+1 yk )
k=0 k=0 k=rm
(b a)(m 1) + ysm yrm
(by the inductive hypothesis)
(b a)m

(because ysm b, yrm = a), and the induction proceeds.


Accordingly
Psu 1
k=0 ck (yk+1 yk ) (b a)u.
Pn1
As for the sum k=su ck (yk+1 yk ), we have ck = 0 for su k < ru+1 , ck = 1 for ru k < su+1 , while
su+1 > n, so if n ru+1 we have
Pn1 Psu 1
k=0 ck (yk+1 yk ) = k=0 ck (yk+1 yk ) (b a)u,

while if n > ru+1 we have


n1
X sX
u 1 n1
X
ck (yk+1 yk ) = ck (yk+1 yk ) + yk+1 yk
k=0 k=0 k=ru+1

(b a)u + yn yru+1
(b a)u
because yn a = yru+1 . Thus in both cases we have the required result. Q
Q
(b)(i) Now define
Yk () = max(a, Xk ()) for dom Xk ,
T
Fk = { : ik dom Xi , j k, Xj () a, Xi () < b if j i k}
for each k N. If hn inN is a non-decreasing sequence of -algebras to which hXn inN is adapted, then
Fk k (because if j k all the sets dom Xj , { : Xj () a}, { : Xj () < b} belong to j k ).
R R
(ii) We find that F Yk F Yk+1 if F k . P P Set G = { : Xk () > a} k . Then
Z Z
Yk = Xk + a(F \ G)
F
ZF G
= Xk+1 + a(F \ G)
ZF G Z Z
Yk+1 + Yk+1 = Yk+1 . Q
Q
F G F \G F
R R
(iii) Consequently F
Yk+1 Yk Yk+1 Yk for every F k .
R R R R
P
P (Yk+1 Yk ) F (Yk+1 Yk ) = \F Yk+1 \F Yk 0. Q
Q
T
(c) Let H be the conegligible set dom U = in dom Xi n . We ought to check at some point that
U is n -measurable; but this is clearly true, because all the relevant sets { : Xi () a}, { : Xi () b}
belong to n . For each H, apply (a) to the list X0 (), . . . , Xn () to see that
275H Martingales 385

Pn1
(b a)U () k=0 Fk ()(Yk+1 () Yk ()).
Because H is conegligible, it follows that

XZ
n1 XZ
n1
(b a)E(U ) Yk+1 Yk Yk+1 Yk
k=0 Fk k=0
(using (b-iii))
= E(Yn Y0 ) E((Xn X0 )+ )
because Yn Y0 (Xn X0 )+ everywhere on dom Xn dom X0 . This completes the proof.

275G We are now ready for the principal theorems of this section.
Doobs Martingale Convergence Theorem Let hXn inN be a martingale on a probability space (, , ),
and suppose that supnN E(|Xn |) < . Then limn Xn () is defined in R for almost every in .
T
proof (a) Set H = nN dom Xn , and for H set
Y () = lim inf n Xn (), Z() = lim supn Xn (),
allowing in both cases. But note that Y lim inf n |Xn |, so by Fatous Lemma Y () < for almost
every ; similarly Z() > for almost every . It will therefore be enough if I can show that Y =a.e. Z,
for then Y () = Z() R for almost every , and hXn ()inN will be convergent for almost every .
(b) ?? So suppose, if possible, that Y and Z are not equal almost everywhere. Of course both are
-measurable, where (, , ) is the completion of (, , ), so we must have
{ : H, Y () < Z()} > 0.
Accordingly there are rational numbers q, q 0 such that q < q 0 and G > 0, where
G = { : H, Y () < q < q 0 < Z()}.
Now, for each H, n N, let Un () be the number of up-crossings from q to q 0 in the list X0 (), . . . ,
Xn (). Then 275F tells us that
1 1 2M
E(Un ) E((Xn X0 )+ ) E(|Xn | + |X0 |) ,
q 0 q q 0 q q 0 q

if we write M = supiN E(|Xi |). By B.Levis theorem, U () = supnN Un () < for almost every . On
the other hand, if G, then there are arbitrarily large j, k such that Xj () < q and Xk () > q 0 , so
U () = . This means that G must be 0, contrary to the choice of q, q 0 . X
X
(c) Thus we must in fact have Y =a.e. Z, and hXn ()inN is convergent for almost every , as claimed.

275H Theorem Let (, , ) be a probability space, and hn inN a non-decreasing sequence of -


subalgebras of . Let hXn inN be a martingale adapted to hn inN . Then the following are equiveridical:
(i) there is a random variable X, of finite expectation, such that Xn is a conditional expectation of X
on n for every n;
(ii) {Xn : n N} is uniformly integrable;
(iii) X () = limn Xn () is defined in R for almost every , and E(|X |) = limn E(|Xn |) < .
proof (i)(ii) By 246D, the set of all conditional expectations of X is uniformly integrable, so {Xn : n N}
is surely uniformly integrable.
(ii)(iii) If {Xn : n N} is uniformly integrable, we surely have supnN E(|Xn |) < , so 275G tells
us that X is defined almost everywhere. By 246Ja, X is integrable and limn E(|Xn X |) = 0.
Consequently E(|X |) = limn E(|Xn |) < .
(iii)(i) Because E(|X |) = limn E(|Xn |), limn E(|Xn X |) = 0 (245H(a-ii)). Now let n N,
E n . Then
R R R
E
Xn = limm E Xm = E X .
As E is arbitrary, Xn is a conditional expectation of X on n .
386 Probability theory 275I

275I Theorem Let (, , ) be a probability space, and S hn inN a non-decreasing sequence of -


subalgebras of ; write for the -algebra generated by nN n . Let X be any real-valued random
variable on with finite expectation, and for each n N let Xn be a conditional expectation of X on
n . Then X () = limn Xn () is defined almost everywhere; limn E(|X Xn |) = 0, and X is a
conditional expectation of X on .
proof (a) By 275G-275H, we know that X is defined almost everywhere, and, as remarked in 275H,
limn E(|X Xn |) = 0. To see that X is a conditional expectation of X on , set
R R S
A = {E : E , E X = E X}, I = nN n .
Now I and A satisfy the conditions of the Monotone Class Theorem (136B). P ) Of course I and I
P (
is closed under finite intersections, because hn inN is a non-decreasing sequence of -algebras; in fact I is
a subalgebra of P, and is closed under finite unions and complements. ( ) If E I, say E n ; then
R R R
E
X = limm E Xm = E X,
as in (iii)(i) of 275H, so E A. Thus I A. ( ) If E, F A and E F , then
R R R R R R
F \E
X = F X E X = F X E X = F \E X,
so F \ E A. ( ) If hEk inN is a non-decreasing sequence in A with union E, then
R R R R
E
X = limk Ek X = limk Ek X = E X,
so E A. Thus A is a Dynkin class. Q
Q
Consequently, by 136B, A includes ; that is, X is a conditional expectation of X on .
Remark I have written limn E(|Xn X |) = 0; but you may prefer to say X
= limn Xn in
1
L (), as in Chapter 24.
The importance of this theorem is such that you may be interested in a proof based on 275D rather than
275E-275G; see 275Xd.

*275J As a corollary of this theorem I give an important result, a kind of density theorem for product
measures.
Proposition Let h(n , n , n )inN be a sequence of probability spaces with product (, , ). Let X be a
real-valued random variable on with finite expectation. For each n N define Xn by setting
R
Xn ( ) = X(0 , . . . , n , n+1 , . . . )d(n+1 , . . . )
R
wherever this is Q
defined, where I write . . . d(n+1 , . . . ) to mean integration with respect to the product
measure 0n on in+1 i . Then X( ) = limn Xn ( ) for almost every = (0 , 1 , . . . ) in , and
limn E(|X Xn |) = 0.
proof For each n, we can identify with the product of n and 0n , where n is the product measure
on 0 . . .Q
n (254N). So 253H tells us that Xn is a conditional expectation of X on the -algebra
n = {E i>n i : E dom n }. Since (by 254N again) we can think of n+1 as the product of n and
n+1 , n n+1 for each n. So 275I tells us that hXn inN converges almost everywhere to a conditional
S N
expectation X of X on the -algebra generated by nN n . Now and also c nN n ,
so every member of is sandwiched between two members of of the same measure (254Ff), and X
must be equal to X almost everywhere. Moreover, 275I also tells us that
limn E(|X Xn |) = limn E(|X Xn |) = 0,
as required.

275K Reverse martingales We have a result corresponding to 275I for decreasing sequences of -
algebras. While this is used less often than 275G-275I, it does have very important applications.
Theorem Let (, , ) be a probability space, and hn inN a non-increasing sequence of -subalgebras
of , with intersection . Let X be any real-valued random variable with finite expectation, and for
275M Martingales 387

each n N let Xn be a conditional expectation of X on n . Then X = limn Xn is defined almost


everywhere and is a conditional expectation of X on .
T
proof (a) Set H = nN dom Xn , so that H is conegligible. For n N, a < b in R, and H, write
Uabn () for the number of up-crossings from a to b in the list Xn (), Xn1 (), . . . , X0 () (275E). Then

1
E(Uabn ) E((X0 Xn )+ )
ba
(275F)
1 2
E(|X0 | + |Xn |) E(|X0 |) < .
ba ba
So limn Uabn () is finite for almost every . But this means that
{ : lim inf n Xn () < a, lim supn Xn () > b}
is negligible. As a and b are arbitrary, hXn inN is convergent a.e., just as in 275G. Set X () =
limn Xn () whenever this is defined in R.
(b) By 246D, {Xn : n N} is uniformly integrable, so E(|Xn X |) 0 as n (246Ja), and
R R R
E
X = limn E Xn = E X0
for every E .
(c) Now there is a conegligible set G such that G dom X and X G is -measurable. P P
For each nS N,Tthere is a conegligible set Gn n such
S that
T G n dom Xn and X n Gn is n -measurable.
Set G0 = nN mn Gm ; then, for any r N, G0 = nr mn Gm belongs to r , so G0 , while of
course G0 is conegligible. For n N, set Xn0 () = Xn () for Gn , 0 for \ Gn ; then for G0 ,
limn Xn0 () = limn Xn0 () if either is defined in R. Writing X 0
= limn Xn0 whenever this is
0 0
defined in R, 121F and 121H tell us that X is r -measurable and dom X r for every r N, so that
G = dom X belongs to and X is -measurable. We also know, from (b), that G00 is conegligible.
00 0 0

So setting G = G0 G00 we have the result. Q Q


Thus X is a conditional expectation of X on .

275L Stopping times In a sense, the main work of this section is over; I have no room for any more
theorems of importance comparable to 275G-275I. However, it would be wrong to leave this chapter without
briefly describing one of the most fruitful ideas of the subject.
Definition Let (, , ) be a probability space, with completion (, , ), and hn inN a non-decreasing
sequence of -subalgebras of . A stopping time adapted to hn inN (also called optional time,
Markov time) is a function from to N {} such that { : () n} n for every n N.
Remark Of course the condition
{ : () n} n for every n N
can be replaced by the equivalent condition
{ : () = n} n for every n N.
I give priority to the former expression because it is more appropriate to other index sets (see 275Cc).

275M Examples (a) If hXn inN is a martingale adapted to a sequence hn inN of -algebras, and Hn
is a Borel subset of R n+1 for each n, then we have a stopping time adapted to hn inN defined by the
formula
T
() = inf{n : in dom Xi , (X0 (), . . . , Xn ()) Hn },
setting inf = as usual. (For
S by 121K the set En = { : (X0 (), . . . , Xn ()) Hn } belongs to n for
each n, and { : () n} = in Ei .) In particular, for instance, the formulae
inf{n : Xn () a}, inf{n : |Xn ()| > a}
define stopping times.
388 Probability theory 275Mb

(b) Any constant function : N {} is a stopping time. If , 0 are two stopping times adapted
to the same sequence hn inN of -algebras, then 0 is a stopping time adapted to hn inN , setting
( 0 )() = min( (), 0 ()) for .

275N Lemma Let (, , ) be a complete probability space, and hn inN a non-decreasing sequence of
-subalgebras of . Suppose that and 0 are stopping times on , and hXn inN a martingale, all adapted
to hn inN .
(a) The family
= {E : E , E { : () n} n for every n N}
is a -subalgebra of .
(b) If () 0 () for every , then 0 .
(c) Now suppose that is finite almost everywhere. Set
X () = X () ()
whenever () < and dom X () . Then dom X and X is -measurable.
(d) If is essentially bounded, that is, there is some m N such that m almost everywhere, then
E(X ) exists and is equal to E(X0 ).
(e) If 0 almost everywhere, and 0 is essentially bounded, then X is a conditional expectation of
X 0 on .
proof (a) This is elementary. Write Hn = { : () n} for each n N. The empty set belongs to
because it belongs to n for every n. If E , then
( \ E) Hn = Hn \ (E Hn ) n
because Hn n ; this is true for for every n, so X \ E . If hEk ikN is any sequence in then
S S
( kN Ek ) Hn = kN Ek Hn n
S
for every n, so kN Ek .
(b) If E then of course E , and if n N then { : 0 () n} { : () n}, so that
E { : 0 () n} = E { : () n} { : 0 () n}
belongs to n ; as n is arbitrary, E 0 .
(c) Set Hn = { : () n} for each n N. For any a R,

Hn { : dom X , X () a}
[
= { : () = k, dom Xk , Xk () a} n .
kn

As n is arbitrary,
Ga = { : dom X , X () a} .
S
As a is arbitrary, dom X = Gm and X is -measurable.
mN
S
(d) Set Hk = { : () = k} for k m. Then km Hk is conegligible, so
Pm R Pm R R R
E(X ) = k=0 Hk Xk = k=0 Hk Xm = Xm = X0 .

(e) Suppose 0 n almost everywhere. Set Hk = { : () = k}, Hk0 = { : 0 () = k} for each k;


then both hHk ikn and hHk0 ikn are disjoint covers of conegligible subsets of X. Now suppose that E .
Then
R Pn R Pn R Pn R R
E
X = k=0 EHk X = k=0 EHk Xk = k=0 EHk Xn = E Xn
R R R R
because E Hk k for every k. By (b), E 0 , so we also have E X 0 = E Xn . Thus E X = E X 0
for every E , as claimed.
275Xb Martingales 389

275O Proposition Let hXn inN be a martingale and a stopping time, both adapted to the same
sequence hn inN of -algebras. For each n, set ( n)() = min( (), n) for ; then n is a stopping
time, and hX n inN is a martingale adapted to h n inN , defining X n and n as in 275N.

proof As remarked in 275Mb, each n is a stopping time. If m n, then m n by 275Nb.


Each X m is m -measurable, with domain belonging to m , by 275Nc, and has finite expectation, by
275Nd; finally, if m n, then X m is a conditional expectation of X n on m , by 275Ne.

275P Corollary Suppose that (, , ) is a probability space and hXn inN is a martingale on such
that W = supnN |Xn+1 Xn | is finite almost everywhere and has finite expectation. Then for almost every
, either limn Xn () exists in R or supnN Xn () = and inf nN Xn () = .
proof Let hn iTnN be a non-decreasing sequence of -algebras to which hXn inN is adapted. Let H be the
conegligible set nN dom Xn { : W () < }. For each m N, set
m () = inf{n : dom Xn , Xn () > m}.
As in 275Ma, m is a stopping time adapted to hn inN . Set
Ymn = Xm n ,
defined as in 275Nc, so that hYmn inN is a martingale, by 275O. If H, then either m () > n and
Ymn () = Xn () m,
or 0 < m () n and
Ymn () = Xm () () W () + Xm ()1 () W () + m,
or m () = 0 and
Ymn () = X0 ().
Thus
Ymn () |X0 ()| + W () + m
for every H, and
|Ymn ()| = 2 max(0, Ymn ()) Ymn () 2(|X0 ()| + W () + m) Ymn (),

E(|Ymn |) 2E(|X0 |) + 2E(W ) + 2m E(Ymn ) = 2E(|X0 |) + 2E(W ) + 2m E(X0 )


by 275Nd. As this is true for every n N, supnN E(|Ymn |) < , and limn Ymn is defined in R almost
everywhere, by Doobs Martingale Convergence
T Theorem (275G). Let Fm be the conegligible set on which
hYmn inN converges. Set H = H mN Fm , so that H is conegligible.
Now consider
E = { : H , supnN Xn () < }.
For any E, there must be an m N such that supnN Xn () m. Now this means that Ymn () = Xn ()
for every n, and as Fm we have
limn Xn () = limn Ymn () R.
This means that hXn ()inN is convergent for almost every such that {Xn () : n N} is bounded above.
Similarly, hXn ()inN is convergent for almost every such that {Xn () : n N} is bounded below,
which completes the proof.

275X Basic exercises > (a) Let hXnP inN be an independent sequence of random variables with zero
n
expectation and finite variance. Set sn = ( i=0 Var(Xi ))1/2 , Yn = (X0 + . . . + Xn )2 s2n for each n. Show
that hYn inN is a martingale.

1
>(b) Let hXn inN be a martingale. Show that for any > 0, Pr(supnN |Xn |) ) supnN E(|Xn |).
390 Probability theory 275Xc

(c) Polyas urn scheme Imagine a box containing red and white balls. At each move, a ball is drawn
at random from the box and replaced together with another of the same colour. (i) Writing Rn , Wn for
the numbers of red and white balls after the nth move and Xn = Rn /(Rn + Wn ), show that hXn inN is a
martingale. (ii) Starting from R0 = W0 = 1, find the distribution of (Rn , Wn ) for each n. (iii) Show that
X = limn Xn is defined almost everywhere, and find its distribution when R0 = W0 = 1. (See Feller
66 for a discussion of other starting values.)

> (d) Let (, , ) be a probability space, and hn inN a non-decreasing sequence of -subalgebras of
; for each n N let Pn : L1 L1 be the conditional expectation operator corresponding to n , where
L1 = L1 () (242J). (i) Show that V = {u : u L1 , limn kPn u uk1 =S0} is a k k1 -closed linear subspace
of L1 . (ii) Show that {E : E , E V } is a Dynkin class including nN n , so includes the -algebra
S
generated by nN n . (iii) Show that if u L1 then v = supnN Pn |u| is defined in L1 and is of the
form W where Pr(W ) 1 kuk1 for every > 0. (Hint: 275D.) (iv) Show that if X is a -measurable
random variable with finite expectation, and for each n N Xn is a conditional expectation of X on n ,
then X V and X =a.e. limn Xn . (Hint: apply (iii) to u = (X Xm ) for large m.)

(e) Let (, , ) be a probability


S space, hn inN a non-decreasing sequence of -subalgebras of , and
the -algebra generated by nN n . For each n N {} let Pn : L1 L1 be the conditional
expectation operator corresponding to n , where L1 = L1 (). Show that limn kPn u ukp = 0 whenever
p [1, [ and u Lp (). (Hint: 275Xd, 233J/242K, 246Xg.)

(f ) Let hXn inN be a martingale, and suppose that p ]1, [ is such that supnN kXn kp < . Show
that X = limn Xn is defined almost everywhere and that limn kXn Xkp = 0.

> (g) Let (, , ) be [0, 1] with Lebesgue measure. For each n N let n be the finite subalgebra
of generated by intervals of the type [0, 2n r] for n
R r 2 . Use 275I to show that for any integrable
n
X : [0, 1] R we must have X(t) = limn 2 In (t) X for almost every t [0, 1[, where In (t) is the
interval of the form [2n r, 2n (r + 1)[ containing t. Compare this result with 223A and 261Yd.

(h) In 275K, show that limn kXn X kp = 0 for any p [1, [ such that kX0 kp is finite. (Compare
275Xe.)

(i) Let (, , ) be a probability space, with completion (, , ), and hn inN a non-decreasing sequence
of -subalgebras of . Let hXn inN be a uniformly integrable martingale adapted to n , and set X =
limn Xn . Let be a stopping time adapted to hn inN , and set X () = X () () whenever
dom X () , allowing as a value of (). Show that X is a conditional expectation of X on , as
defined in 275N.

(j) Let (, , ) be a probability space, with completion (, , ), and hn inN a non-decreasing sequence
of -subalgebras of . Let hXn inN be a martingale and a stopping time, both adapted to hn inN .
Suppose that supnN E(|Xn |) < and that is finite almost everywhere. Show that X , as defined in
275Nc, has finite expectation, but that E(X ) need not be equal to E(X0 ).

(k) (i) Find a martingale hXn inN such that hX2n inN 0 a.e. but |X2n+1 | a.e. 1 for every n N. (ii)
Find a martingale which converges in measure but is not convergent a.e.

275Y Further exercises (a) Let (, , ) be a complete probability space, hn inN a non-decreasing
sequence of -subalgebras of all containing every negligible set, and hXn inN a martingale adapted to
hn inN . Let be another probability measure with domain which is absolutely continuous with respect
to , with Radon-Nikodym derivative Z. For each n N let Zn be a conditional expectation of Z on n
(with respect to the measure ). (i) Show that Zn is a Radon-Nikodym derivative of n with respect
to n . (ii) Set Wn () = Zn ()/Zn1 () if this is defined in R, otherwise 0. For n 1, let Vn be a
conditional Pexpectation of Wn (Xn Xn1 ) on n1 (with respect to the measure ). Set Y0 = X0 ,
n
Yn = Xn k=1 Vk for n 1. Show that hYn inN is a martingale adapted to hn inN with respect to the
measure .
275Yk Martingales 391

(b) Combine the ideas of 275Cc with those of 275Cd-275Ce to describe a notion of martingale indexed
by I, where I is an arbitrary partially ordered set.

(c) Let hXk ikN be a martingale on a complete probability space (, , ), and fix n N. Set X =
p
max(|X0 |, . . . , |Xn |). Let p ]1, [. Show that kX kp kXn kp . (Hint: set Ft = { : X () t}.
R p1
Show that tFt Ft |Xn |. Using Fubinis theorem on [0, [ and on [0, [ [0, [, show that
R
E((X )p ) = p 0 tp1 Ft dt,
R R 1
0
tp2 Ft
|Xn |dt = E(|Xn | (X )p1 ),
p1

E(|Xn | (X )p1 ) kXn kp kX kp1


p .
Compare 286A.)

(d) Let hXk ikN be a martingale on a complete probability space (, , ), and fix n N. Set X =
max(|X0 |, . . . , |Xn |), Ft = { : X () t}, Gt = { : |Xn ()| 21 t} for t 0. (i) Show that tFt
R
2 Gt |Xn | for every t 0. (ii) Show that E(X ) 1 + 2 ln 2E(|Xn |) + 2E(|Xn | ln+ |Xn |), where ln+ t = ln t
for t 1, 0 for t [0, 1].

(e) Let (, , ) be a probability space and hi iiI a countable family of -subalgebras of such that for
any i, j I either i j or j i . Let X be a real-valued random variable on such that kXkp < ,
where 1 < p < , and suppose that Xi is a conditional expectation of X on i for each i I. Show that
p
k supiI |Xi |kp kXkp .
p1

(f ) Let (, , ) be a probability space, with completion (, , ), and let hn inN be a non-decreasing


sequence of -subalgebras of . Let hXn inN be a sequence of -integrable real-valued functions such that
dom Xn n and Xn is n -measurable for each Rn N. We sayR that hXn inN is a submartingale adapted
to hn inN (also called semi-martingale) if E Xn+1 E Xn for every n N. Prove versions of 275D,
275F, 275G, 275Xf for submartingales.

(g) Let hXn inN be a martingale, and : R R a convex function. Show that h(Xn )inN is a
submartingale. (Hint: 233J.) Re-examine part (b-ii) of the proof of 275F in the light of this fact.

(h) Let hXn inN be an independent sequence of non-negative random variables all with expectation 1.
Set Wn = X0 . . . Xn for every n. (i) Show that W = limn Wn is defined a.e. (ii) Show that E(W ) is
either 0 or 1. (Hint: suppose E(W ) > 0. Set Zn = limm Xn . . . Xm . Show that limn Zn = 1 when
0 < W < , therefore a.e., by the zero-one law, while E(ZnQ ) 1, byFatous lemma, so limn E(Zn ) = 1,

while E(W ) = E(Wn )E(Zn+1 ) for every n.) (iii) Set = n=0 E( Xn ). Show that >0 iff E(W ) = 1.
(Hint: Pr(Wn 14 2 ) 14 2 for every n, so if > 0 then W cannot be zero a.e.; while E( W ) .)

(i) Let h(n , n , n )inN be a sequence of probability spaces with product (, , ). Suppose that for
each n N we have a probability measure n , with domain n ,QwhichR isabsolutely continuous with respect
to n , with Radon-Nikodym derivative fn , and suppose that nN fn dn > 0. Let be the product
of hn inN
Q . Show that is an indefinite-integral measure over , with Radon-Nikodym derivative
R f , where
) = nN fn (n ) for -almost every = hn inN in . (Hint: use 275Yh to show that f d = 1.)
f (

(j) Let hpn inN be a sequence in [0, 1]. Let be the usual measure on {0, 1}N (254J) and the product
of hn inN , where n is the probability
Pmeasure on {0, 1} defined by setting n {1} = pn . Show that is an

indefinite-integral measure over iff n=0 |pn 21 |2 < .

(k) Let (, , ) be a probability space, hn inN a non-decreasing sequence of -subalgebras of and


hXn inN a sequence of random variables on such that E(supnN |Xn |) is finite and X = limn Xn is
defined almost everywhere. For each n, let Yn be a conditional expectation of Xn on n . Show that
S hYn inN
converges almost everywhere to a conditional expectation of X on the -algebra generated by nN n .
392 Probability theory 275Yl

(l) Show that 275Yk can fail if hXn inN is merely uniformly integrable, rather than dominated by an
integrable function.

(m) Let (, , ) be a probability space, hn inN an independent sequence of -subalgebras of , and X


a random variable on with finite variance. Let Xn be a conditional
Pexpectation of X on n for each n.
Show that limn Xn = E(X) almost everywhere. (Hint: consider n=0 Var(Xn ).)

(n) Let (, , ) be a complete probability space, and hXn inN an independent sequence of random
1
variables on , all with the same distribution, and of finite expectation. For each n, set Sn = n+1 (X0 +

S
. . . + Xn ); let n be the -algebra defined by Sn and n the -algebra generated by mn m . Show
that Sn is a conditional expectation of X0 on n . (Hint: assume every Xi defined everywhere on . Set
() = hXi ()iiN . Show that : R N is inverse-measure-preserving for a suitable product measure on
R N , and that every set in n is of the form 1R [H] where N
R H R is a Borel set invariant under permutations
of coordinates in the set {0, . . . , n}, so that E Xi = E Xj whenever i j n and E n .) Hence show
that hSn inN converges almost everywhere. (Compare 273I.)

(o) Formulate and prove versions of the results of this chapter for martingales consisting of functions
taking values in C or R r rather than R.

(p) Find a martingale hXn inN such that the sequence Xn of distributions (271C) is convergent for the
vague topology (274Ld), but hXn inN is not convergent in measure.

275 Notes and comments I hope that the sketch above, though distressingly abbreviated, has suggested
some of the richness of the concepts involved, and will provide a foundation for further study. All the
theorems of this section have far-reaching implications, but the one which is simply indispensable in advanced
measure theory is 275I, Levys martingale convergence theorem, which I will use in the proof of the Lifting
Theorem in Chapter 34 of the next volume.
As for stopping times, I mention them partly in an attempt to cast further light on what martingales are
for (see 276Ed below), and partly because the ideas of 275N-275O are so important in modern probability
theory that, just as a matter of general knowledge, you should be aware that there is something there. I
add 275P as one of the most accessible of the standard results which may be obtained by this method.

276 Martingale difference sequences


Hand in hand with the concept of martingale is that of martingale difference sequence (276A), a direct
generalization of the notion of independent sequence. In this section I collect results which can be naturally
expressed in terms of difference sequences, including yet another strong law of large numbers (276C). I end
the section with a proof of Komlos theorem (276H).

276A Martingale difference sequences (a) If hXn inN is a martingale adapted to a sequence hn inN
of -algebras, then we have
R
E
Xn+1 Xn = 0
whenever E n . Let us say that if (, , ) is a probability space, with completion (, , ), and hn inN
is a non-decreasing sequence of -subalgebras of , then a martingale difference sequence adapted to
hn inN is a sequence hXn inN of real-valued random variablesRon , all with finite expectation, such that
(i) dom Xn n and Xn is n -measurable, for each n N (ii) E Xn+1 = 0 whenever n N, E n .
Pn
(b) Evidently hXn inN is a martingale difference sequence adapted to hn inN iff h i=0 Xi inN is a
martingale adapted to hn inN .

(c) Just as in 275Cd, we can say that a sequence hXn inN is in itself a martingale difference sequence
Pn
if h i=0 Xi inN is a martingale, that is, if hXn inN is a martingale difference sequence adapted to hn inN ,
S
where n is the -algebra generated by in Xi .
276C Martingale difference sequences 393

(d) If hXn inN is a martingale difference sequence then han Xn inN is a martingale difference sequence
for any real an .
(e) If hXn inN is a martingale difference sequence and Xn0 =a.e. Xn for every n, then hXn0 inN is a
martingale difference sequence. (Compare 275Ce.)
(f ) Of course the most important example of martingale difference sequence is that of 275Bb: any
independent sequence of random variables with zero expectation is a martingale difference sequence. It
turns out that some of the theorems of 273 concerning such independent sequences may be generalized to
martingale difference sequences.
P 2
276BPProposition Let hXn inN be a martingale difference sequence such that n=0 E(Xn ) < .
Then n=0 Xn is defined, and finite, almost everywhere.
proof (a) Let (, , ) be the underlying probability space, (, , ) its completion, and hn inN a non-
Pn
decreasing sequence of -subalgebras of such that hXn inN is adapted to hn inN . Set Yn = i=0 Xi for
each n N. Then hYn inN is a martingale adapted to hn inN .
P Yn is a sum of random variables with finite variance, so E(Yn2 ) < ,
(b) E(Yn Xn+1 ) = 0 for each n. P
by 244Ba; it follows that Yn Xn+1 has finite expectation, by 244Eb. Because the constant function 0 is a
conditional expectation of Xn+1 on n ,
E(Yn Xn+1 ) = E(Yn 0 ) = 0,
by 242L. Q
Q
Pn
(c) It follows that E(Yn2 ) = i=0 E(Xi2 ) for every n. P
P Induce on n. For the inductive step, we have
2
E(Yn+1 ) = E(Yn2 + 2Yn Xn+1 + Xn+1
2
) = E(Yn2 ) + E(Xn+1
2
)
because, by (b), E(Yn Xn+1 ) = 0. Q
Q
(d) Of course
R p
E(|Yn |) = |Yn | kYn k2 kk2 = E(Yn2 ),
so
p pP
supnN E(|Yn |) supnN E(Xi2 ) < .
E(Yn2 ) = i=0
P
By 275G, limn Yn is defined and finite almost everywhere, that is, i=0 Xi is defined and finite almost
everywhere.

276C The strong law of large numbers: fourth form Let hXn inN be a martingale difference
sequence, and suppose that hbn inN is a non-decreasing sequence in ]0, [, diverging to , such that
P 1
n=0 2 Var(Xn ) < . Then
bn
1 Pn
limn i=0 Xi = 0
bn
almost everywhere.
proof (Compare 273D.) As usual, write (, , ) for the underlying probability space. Set
1
Xn = Xn
bn

for each n; then hXn inN is also a martingale difference sequence, and
P 2
P 1
n=1 E(Xn ) = n=1 2 Var(Xn ) < .
bn

By 276B, hXn ()inN is summable for almost every . But by 273Cb,


1 Pn 1 Pn
limn i=0 Xi () = limn i=0 bi Xi () = 0
bn bn
for all such . So we have the result.
394 Probability theory 276D

276D Corollary Let hXn inN be a martingale such that bn = E(Xn2 ) is finite for each n.
(a) If supnN bn is infinite, then limn b1n Xn =a.e. 0.
(b) If supn1 n1 bn < , then limn n1 Xn =a.e. 0.
proof Consider the martingale difference sequence hYn inN = hXn+1 Xn inN . Then E(Yn Xn ) = 0, so
E(Yn2 ) + E(Xn2 ) = E(Xn+1
2
) for each n. In particular, hbn inN must be non-decreasing.
(a) If limn bn = , take m such that bm > 0; then
P 1 P 1 R 1
n=m 2 Var(Yn ) = n=m 2 (bn+1 bn ) bm 2 dt < .
bn+1 bn+1 t

By 276C (modifying bi for i < m, if necessary),


1 1 Pn 1 Pn
limn Xn = limn (X0 + i=0 Yi ) = limn i=0 Yi = 0
bn bn+1 bn+1
almost everywhere.
(b) If = supn1 n1 bn < , then 1
(n+1)2 min(1, 2 /t2 ) for bn < t bn+1 , so
P 1 R 1
n=0 (n+1)2 (bn+1 bn ) + 2
dt < ,
t2

and, by the same argument as before, limn n1 Xn =a.e. 0.

276E Impossibility of systems (a) I return to the word martingale and the idea of a gambling
system. Consider a gambler who takes a sequence of fair bets, that is, bets which have payoff expectations
of zero, but who chooses which bets to take on the basis of past experience. The appropriate model for
such a sequence of random events is a martingale in the sense of 275A, taking n to be the algebra of all
events which are observable up to and including the outcome of the nth bet, and Xn to be the gamblers
net gain at that time. (In this model it is natural to take 0 = {, } and X0 = 0.) Certain paradoxes can
arise if we try to imagine this model with atomless n ; to begin with it is perhaps easier to work with the
discrete case, in which each n is finite, or is the set of unions of some countable family of atomic events.
Now suppose that the bets involved are just two-way bets, with two equally likely outcomes, but that the
gambler chooses his stake each time. In this case we can think of the outcomes as corresponding to an
independent sequence hWn inN of random variables, each taking the values 1 with equal probability. The
gamblers system must be of the form
Xn+1 = Xn + Zn+1 Wn+1 ,
where Zn+1 is his stake on the (n + 1)-st bet, and must R be constant on each atom of the -algebra n
generated by W1 , . . . , Wn . The point is that because E Wn+1 = 0 for each E n , E(Zn+1 Wn+1 ) = 0,
so E(Xn+1 ) = E(Xn ).

(b) The general result, of which this is a special case, is the following. If hWn inN is a martingale
difference sequence adapted to hn inN , and hZn in1 is a sequence of random variables such that (i) Zn is
n1 -measurable (ii) Zn Wn has finite expectation for each n R1, then W0 , Z1 W1 , Z2 W2 , . . . is a
martingale difference sequence adapted to hn inN ; the proof that E Zn+1 Wn+1 = 0 for every E n
is exactly the argument of (b) of the proof of 276B.

(c) I invited you to restrict your ideas to the discrete case for a moment; but if you feel that you understand
what it means to say that a system or predictable sequence hZn in1 must be adapted to hn inN , in
the sense that every Zn is Rn1 -measurable, then any further difficulty lies in the measure theory needed
to show that the integrals E Zn+1 Wn+1 are zero, which is what this book is about.

(d) Consider the gambling system mentioned in 275Cf. Here the idea is that Wn = 1, as in (a), and
Zn+1 = 2n a if Xn 0, 0 if Xn > 0; that is, the gambler doubles his stake each time until he wins, and
then quits. Of course he is almost sure to win eventually, so we have limn Xn =a.e. a, even though
E(Xn ) = 0 for every n. We can compute the distribution of Xn : for n 1 we have Pr(Xn = a) = 1 2n ,
*276G Martingale difference sequences 395

Pr(Xn = (2n 1)a) = 2n . Thus E(|Xn |) = (2 2n+1 )a and the almost-everywhere convergence of the
Xn is an example of Doobs Martingale Convergence Theorem. Pn
In the language of stopping times (275N), Xn = Y n , where Yn = k=0 2k aWk and = min{n : Yn >
0}.

*276F I come now to Komlos theorem. The first step is a trifling refinement of 276C.
Lemma Let (, , ) be a probability space, and hn inN a non-decreasing sequence of -subalgebras of .
Suppose that hXn inN is a sequence of random variables on such that (i) Xn is n -measurable for each
P 1
n (ii) n=0 E(Xn2 ) is finite (iii) limn Xn0 =a.e. 0, where Xn0 is a conditional expectation of Xn on
(n+1)2
1 Pn
n1 for each n 1. Then limn k=0 Xk =a.e. 0.
n+1

proof Making suitable adjustments on a negligible set if necessary, we may suppose that Xn0 is n1 -
measurable for n 1 and that every Xn and Xn0 is defined on the whole of . Set X00 = X0 and Yn = Xn Xn0
for n N. Then hYn inN is a martingale difference sequence adapted to hn inN . Also E(Yn2 ) E(Xn2 ) for
every n. PP If n 1, Xn0 is square-integrable (244M), and E(Yn Xn0 ) = 0, as in part (b) of the proof of
276B. Now
E(Xn2 ) = E(Yn + Xn0 )2 = E(Yn2 ) + 2E(Yn Xn0 ) + E(Xn0 )2 E(Yn2 ). Q
Q
P 1 2 1 Pn
This means that n=0 (n+1)2 E(Yn ) must be finite. By 276C, limn n+1 i=0 Yi =a.e. 0. But by
1 Pn 0 0
273Ca we also have limn i=0 Xi = 0 whenever limn Xn = 0, which is almost everywhere. So
n+1
1 Pn
limn i=0 Xi =a.e. 0.
n+1

*276G Lemma Let (, , ) be a probability space, and hXn inN a sequence of random variables on
such that supnN E(|Xn |) is finite. For k N and x R set Fk (x) = x if |x| k, 0 otherwise. Let F be
an ultrafilter on N. R R
(a) For each k N there is a measurable function Yk : [k, k] such that limnF E Fk (Xn ) = E Yk
for every E .
(b) limnF E((Fk (Xn ) Yk )2 ) limnF E(Fk (Xn )2 ) for each k.
(c) Y = limk Yk is defined a.e. and limk E(|Y Yk |) = 0.
proof (a) For each k, |Fk (Xn )| a.e. k for every n, so that {Fk (Xn ) : n N} is uniformly integrable, and
{Fk (Xn ) : n N} is relatively weakly compact in L1 = L1 () (247C). Accordingly vk = limnF Fk (Xn )
is defined in L1 (2A3Se); take Yk : R to be a measurable function such that Yk = vk . For any E ,
R R R
Y = vk (E) = limnF E Fk (Xn ).
E k
In particular,
R R
| E
Yk | supnN | E
Fk (Xn )| kE
for every E, so that { : Yk () > k} and { : Yk () < k} are both negligible; changing Yk on a negligible
set if necessary, we may suppose that |Yk ()| k for every .
(b) Because Yk is bounded, Yk L (), and
R R R R
limnF Fk (Xn ) Yk = limnF Fk (Xn ) Yk = Yk Yk = Yk2 .
Accordingly

Z Z Z Z
2 2
lim (Fk (Xn ) Yk ) = lim Fk (Xn ) 2 lim Fk (Xn ) Yk + Yk2
nF nF nF
Z Z Z
= lim Fk (Xn )2 Yk2 lim Fk (Xn )2 .
nF nF
396 Probability theory *276G

(c) Set W0 = Y0 = 0, Wk = Yk Yk1 for k 1. Then E(|Wk |) limnF E(|Fk (Xn ) Fk1 (Xn )|) for
every k 1. P
P Set E = { : Wk () 0}. Then
Z Z Z
Wk = Yk Yk1
E E E
Z Z
= lim Fk (Xn ) lim Fk1 (Xn )
nF E nF E
Z Z
= lim Fk (Xn ) Fk1 (Xn ) lim |Fk (Xn ) Fk1 (Xn )|.
nF E nF E

Similarly,
R R
| X\E
Wk | limnF X\E
|Fk (Xn ) Fk1 (Xn )|.
So
R R R
E(|Wk |) = E Wk X\E Wk limnF |Fk (Xn ) Fk1 (Xn )|. Q
Q
P
It follows that k=0 E(|Wk |) is finite. P
P For any m 1,
m
X m
X
E(|Wk |) lim E(|Fk (Xn ) Fk1 (Xn )|)
nF
k=0 k=1
m
X
= lim E( |Fk (Xn ) Fk1 (Xn )|)
nF
k=1
= lim E(|Fm (Xn )|) sup E(|Xn |).
nF nN
P
So k=0 E(|Wk |) supnN E(|Xn |) is finite.
PmQ Q
By B.Levis theorem (123A), limm k=0 |Wk | is finite a.e., so that
P
Y = limm Ym = k=0 Wk
is defined a.e.; and moreover
Pm
E(|Y Yk |) limm E( j=k+1 |Wj |) 0
as k .

*276H Komlos theorem (Komlos 67) Let (, , R ) be any measure space, and hXn inN a sequence of
integrable real-valued functions on such that supnN |Xn | is finite. Then there are a subsequence hXn0 inN
1 Pn 00 00
of hXn inN and an integrable function Y such that Y =a.e. limn i=0 Xi whenever hXn inN is a
n+1
subsequence of hXn0 inN .
proof Since neither the hypothesis nor the conclusion is affected by changing the Xn on a negligible set, we
may suppose throughout that every Xn is measurable and defined on the whole of . In addition, to begin
with (down to the end of (e) below), let us suppose that X = 1. As in 276G, set Fk (x) = x for |x| k, 0
for |x| > k.
(a) Let F be any non-principal ultrafilter on N (2A1O). For j N set pj = limnF Pr(|Xn | > j). Then
P
j=0 pj is finite. P
P For any k N,

k
X k
X k
X
pj = lim Pr(|Xn | > j) = lim Pr(|Xn | > j)
nF nF
j=0 j=0 j=0
Z Z
lim (1 + |Xn |) 1 + sup |Xn |.
nF nN
P R
So j=0 pj 1 + supnN |Xn | is finite. Q
Q
*276H Martingale difference sequences 397

Setting
p0j = pj pj+1 = limnF Pr(j < |Xn | j + 1)
for each j, we have

X m m
X X
(j + 1)p0j = lim (j + 1)pj (j + 1)pj+1
m
j=0 j=0 j=0
m
X
X
= lim pj (m + 1)pm+1 pj < .
m
j=0 j=0

Next,
R Pk
limnF Fk (Xn )2 j=0 (j + 1)2 p0j
Pk
P Setting Ejn = { : j |Xn ()| < j + 1} for j, n N, Fk (Xn )2 j=0 (j + 1)2 Ejn , so
for each k. P
R Pk Pk
limnF Fk (Xn )2 limnF j=0 (j + 1)2 Ejn = j=0 (j + 1)2 p0j . Q Q

(b) Define hYk ikN and Y =a.e. limk Yk from hXn inN and F as in Lemma 276G. Then
R Pk
Jk = {n : n N, (Fk (Xn ) Yk )2 1 + j=0 (j + 1)2 p0j }
belongs to F for every k N. P
P By (a) above and 276Gb,
R R Pk
limnF (Fk (Xn ) Yk )2 limnF Fk (Xn )2 j=0 (j + 1)2 p0j . Q
Q
Also, of course,
Kk = {n : n N, Pr(Fj (Xn ) 6= Xn ) pj + 2j for every j k}
belongs to F for every k.
R
(c) For n, k N let Zkn be a simple function such that |Zkn | |Fk (Xn )Yk | and |Fk (Xn )Yk Zkn |
2k . For m N let m be the algebra of subsets of generated by sets of the form { : Zkn () = } for
k, n m and R. Because each Zkn takes only finitely many values, m is finite (and is therefore a
-subalgebra of ); and of course m m+1 for every m.
We need to look at conditional expectations on the m , and because m is always finite these have a
particularly straightforward expression. Let Am be the set of atoms, or minimal non-empty sets, in m ;
that is, the set of equivalence classes in under the relation 0 if Zkn () = Zkn ( 0 ) for all k, n m.
For any integrable random variable X on , define Em (X) by setting
Z
1
Em (X)() = X if x A Am and A > 0,
A A
= 0 if x A Am and A = 0.
Then Em (X) is a conditional expectation of X on m .
Now
Z X Z
lim |Em (Fk (Xn ) Yk )| = lim |Em (Fk (Xn ) Yk )|
nF nF A
AAm
X Z
= lim | Em (Fk (Xn ) Yk )|
nF A
AAm
(because Em (Fk (Xn ) Yk ) is constant on each A Am )
X Z
= lim | Fk (Xn ) Yk |
nF A
AAm
X Z
= lim | Fk (Xn ) Yk | = 0
nF A
AAm
398 Probability theory *276H

by the choice of Yk . So if we set


R
Im = {n : n N, |Em (Fk (Xn ) Yk )| 2k for every k m},
then Im F for every m.
(d) Suppose that hr(n)inN is any strictly increasing sequence in N such that r(0) > 0, r(n) Jn Kn
1 Pn
for every n and r(n) Ir(n1) for n 1. Then i=0 Xr(i) Y a.e. as n . P
P Express Xr(n) as
n+1
(Xr(n) Fn (Xr(n) )) + (Fn (Xr(n) ) Yn Zn,r(n) ) + Yn + Zn,r(n)
for each n. Taking these pieces in turn:
(i)


X
X
Pr(Xr(n) 6= Fn (Xr(n) )) pn + 2n
n=0 n=0
(because r(n) Kn for every n)
<

by (a). But this means that Xr(n) Fn (Xr(n) ) 0 a.e., since the sequence is eventually zero at almost
1 Pn
every point, and i=0 Xr(i) Fi (Xr(i) ) 0 a.e. by 273Ca.
n+1

(ii) By the choice of the Zn,r(n) ,


P R P
n=0 |Fn (Xr(n) ) Yn Zn,r(n) | n=0 2n
1 Pn
is finite, so Fn (Xr(n) ) Yn Zn,r(n) 0 a.e. and i=0 Fi (Xr(i) ) Yi Zi,r(i) 0 a.e.
n+1

1 Pn
(iii) By 276G, Yn Y a.e. and i=0 Yi Y a.e.
n+1
R
(iv) We know that, for each n 1, r(n) Ir(n1) . So (because r(n 1) n) |Er(n1) (Fn (Xr(n) )
Yn | 2n . But as also
R R
Er(n1) Fn (Xr(n) ) Yn Zn,r(n) |Fn (Xr(n) ) Yn Zn,r(n) | 2n
by 244M and the choice of Zn,r(n) ,
Z Z
|Er(n1) Zn,r(n) | = Er(n1) (Fn (Xr(n) ) Yn ) Er(n1) (Fn (Xr(n) ) Yn Zn,r(n) )

2n+1
for every n. Accordingly Er(n1) Zn,r(n) 0 a.e.
On the other hand,


X Z
X Z
1 2 1
Zn,r(n) Fn (Xr(n) Yn )2
(n+1)2 (n+1)2
n=0 n=0
X X n
1
(1 + (j + 1)2 p0j )
(n+1)2
n=0 j=0
(because r(n) Jn )

X
X
X
1 1
+ (j + 1)2 p0j
(n+1)2 (n+1)2
n=0 j=0 n=j

X
2
+2 (j + 1)p0j
6
j=0
276Xd Martingale difference sequences 399

is finite. (I am using the estimate


P 1 P 2 2 2
n=j (n+1)2 n=j n+1 = .)
n+2 j+1
1 Pn
By 276F, applied to hr(n) inN and hZn,r(n) inN , i=0 Zi,r(i) 0 a.e.
n+1

1 Pn
(v) Adding these four components, we see that i=0 Xr(i) 0, as claimed. Q
Q
n+1

(e) Now fix any strictly increasing sequence hs(n)inN in N such that s(0) > 0, s(n) Jn Kn for every
n and s(n) Is(n1) for n 1; such a sequence exists because Jn Kn Is(n1) belongs to F, so is infinite,
for every n 1. Set Xn0 = Xs(n) for every n. If hXn00 inN is a subsequence of hXn0 inN , then it is of the form
hXs(r(n)) inN for some strictly increasing sequence hr(n)inN . In this case
s(r(0)) s(0) > 0,

s(r(n)) Jr(n) Kr(n) Jn Kn for every n,

s(r(n)) Is(r(n)1) Is(r(n1)) for every n 1.


1 Pn
So (d) tells us that i=0 Xi00 Y a.e.
n+1

(f ) Thus the theorem is proved in the case in which (, , ) is a probability space. Now suppose that
R is
-finite and > 0. In this case there is a strictly positive measurable function f : R such that f d =
1 (215B(ix)). Let be the corresponding indefinite-integral measure (234B), so that is a probability
1 R 1
measure on , and h Xn inN is a sequence of -integrable functions such that supnN Xn d is finite
f f
(235M). From (a)-(e) we see that there must be a -integrable function Y and a subsequence hXn0 inN of
1 Pn 1 00 00 0
hXn inN such that i=0 f Xi Y -a.e. for every subsequence hXn inN of hXn inN . But and
n+1
1 Pn 00 00
have the same negligible sets (234Dc), so i=0 Xi f Y -a.e. for every subsequence hXn inN of
n+1
hXn0 inN .
(g) Since the result is trivial if = 0, the theorem is true whenever is -finite. For the general case,
set
S S
= nN { : Xn () 6= 0} = m,nN { : |Xn ()| 2m },
so that the subspace measure is -finite. Then there are a -integrable function Y and a subsequence
1 Pn
hXn0 inN of hXn inN such that 00 00 0
i=0 Xi Y -a.e. for every subsequence hXn inN of hXn inN .
n+1
1 Pn 00
Setting Y () = Y () if , 0 for \ , we see that Y is -integrable and that i=0 Xi Y -
n+1
a.e. whenever hXn00 inN is a subsequence of hXn0 inN . This completes the proof.

276X Basic
R exercises
R > (a) Let hXn inN be a martingale adapted to a sequence hn inN of -algebras.
Show that E Xn2 E Xn+12
for every n N, E n (allowing as a value of an integral). (Hint: see the
proof of 276B.)

> (b) Let hXn inN be a martingale. Show that for any > 0,
1
Pr(supnN |Xn | ) supnN E(Xn2 ).
2

(Hint: put 276Xa together with the argument for 275D.)

(c) When does 276Xb give a sharper result than 275Xb?


1
(d) Let hXn inN be a martingale difference sequence and set Yn = n+1 (X0 + . . . + Xn ) for each n N.
Show that if hXn inN is uniformly integrable then limn kYn k1 = 0. (Hint: use the argument of 273Na,
400 Probability theory 276Xd

with 276C in place of 273D, and setting Xn = Xn0 Zn , where Zn is an appropriate conditional expectation
of Xn0 .)
(e) Strong law of large numbers: fifth form A sequence hXn inN of random variables is ex-
changeable if (Xn0 , . . . , Xnk ) has the same joint distribution as (X0 , . . . , Xk ) whenever n0 , . . . , nk are
distinct. Show that if hXn inN is an exchangeable sequence of random variables with finite expectation,
1 P
then h i=0 Xi inN converges a.e. (Hint: 276H.)
n+1

276Y Further exercises (a) Let hXn inN be aP martingale difference sequence such that supnN kXn kp
1 n
is finite, where p ]1, [. Show that limn k n+1 i=0 Xi kp = 0. (Hint: 273Nb.)

(b) Let hXn inN be a uniformly integrable martingale difference sequence and Y a bounded random
variable. Show that limn E(Xn Y ) = 0. (Compare 272Yd.)
(c) Use 275Yg to prove 276Xa.
(d) Let hXn inN be a sequence of random variables such that, for some > 0, supnN n E(|Xn |) is finite.
1
Set Sn = n+1 (X0 + . . . + Xn ) for each n. Show that limn Sn =a.e. 0. (Hint: set Zk = 2k (|X0 | + . . . +
P
|X2k 1 |). Show that k=0 E(Zk ) < . Show that Sn 2Zk+1 if 2k < n 2k+1 .)
(e) Strong law of large numbers: sixth form Let hXn inN be a martingale difference sequence
1
such that, for some > 0, supnN E(|Xn |1+ ) is finite. Set Sn = n+1 (X0 + . . . + Xn ) for each n. Show
that limn Sn =a.e. 0. (Hint: take a non-decreasing sequence hn inN to which hXn inN is adapted.
Set Yn = Xn when |Xn | n, 0 otherwise. Let Un be a conditional expectationPn of Yn on n1 and set
1
Zn = Yn Un . Use ideas from 273H, 276C and 276Yd above to show that n+1 i=0 Vi 0 a.e. for Vi = Zi ,
Vi = Ui , Vi = Xi Yi .)
(f ) Show that there is a martingale hXn inN which converges in measure but is not convergent a.e.
(Compare 273Ba.) (Hint: arrange that { : Xn+1 () 6= 0} = En { : |Xn+1 () Xn ()| 1}, where
1
hEn inN is an independent sequence of sets and En = for each n.)
n+1

(g) Give an example of an identically distributed martingale difference sequence hXn inN such that
1
h n+1 (X0 + . . . + Xn )inN does not converge to 0 almost everywhere. (Hint: start by devising a uniformly
1
bounded sequence hUn inN such that limn E(|Un |) = 0 but h n+1 (U0 + . . . + Un )inN does not converge
to 0 almost everywhere. Now repeat your construction in such a context that the Un can be derived from
an identically distributed martingale difference sequence by the formulae of 276Ye.)
(h) Construct a proof of Komlos theorem which does not involve ultrafilters, or any other use of the full
axiom of choice, but proceeds throughout by selecting appropriate sub-subsequences. Remember to check
that you can prove any fact you use about weakly convergent sequences in L1 on the same rules.

276 Notes and comments I include two more versions of the strong law of large numbers (276C, 276Ye)
not because I have any applications in mind but because I think that if you know the strong law for k k1+ -
bounded independent sequences, and what a martingale difference sequence is, then there is something
missing if you do not know the strong law for k k1+ -bounded martingale difference sequences. And then,
of course, I have to add 276Yf and 276Yg, lest you be tempted to think that the strong law is really about
martingale difference sequences rather than about independent sequences.
Komlos theorem is rather outside the scope of this volume; it is quite hard work and surely much less
important, to most probabilists, than many results I have omitted. It does provide a quick proof of 276Xe.
However it is relevant to questions arising in some topics treated in Volumes 3 and 4, and the proof fits
naturally into this section.
281A The Stone-Weierstrass theorem 401

Chapter 28
Fourier analysis
For the last chapter of this volume, I attempt a brief account of one of the most important topics in
analysis. This is a bold enterprise, and I cannot hope to satisfy the reasonable demands of anyone who
knows and loves the subject as it deserves. But I also cannot pass it by without being false to my own
subject, since problems contributed by the study of Fourier series and transforms have led measure theory
throughout its history. What I will try to do, therefore, is to give versions of those results which everyone
ought to know in language unifying them with the rest of this treatise, aiming to open up a channel for
the transfer of intuitions and techniques between the abstract general study of measure spaces, which is the
centre of our work, and this particular family of applications of the theory of integration.
I have divided the material of this chapter, conventionally enough, into three parts: Fourier series, Fourier
transforms and the characteristic functions of probability theory. While it will be obvious that many ideas
are common to all three, I do not think it useful, at this stage, to try to formulate an explicit generalization
to unify them; that belongs to a more general theory of harmonic analysis on groups, which must wait until
Volume 4. I begin however with a section on the Stone-Weierstrass theorem (281), which is one of the
basic tools of functional analysis, as well as being useful for this chapter. The final section (286), a proof
of Carlesons theorem, is at a rather different level from the rest.

281 The Stone-Weierstrass theorem


Before we begin work on the real subject of this chapter, it will be helpful to have a reasonably general
statement of a fundamental theorem on the approximation of continuous functions. In fact I give a variety
of forms (281A, 281E, 281F and 281G, together with 281Ya, 281Yd and 281Yg), all of which are sometimes
useful. I end the section with a version of Weyls Equidistribution Theorem (281M-281N).

281A Stone-Weierstrass theorem: first form Let X be a topological space and K a compact subset
of X. Write Cb (X) for the space of all bounded continuous real-valued functions on X, so that Cb (X) is a
linear space over R. Let A Cb (X) be such that
A is a linear subspace of Cb (X);
|f | A for every f A;
X A;
whenever x, y are distinct points of K there is an f A such that f (x) 6= f (y).
Then for every continuous h : K R and > 0 there is an f A such that
|f (x) h(x)| for every x K,
if K 6= , inf xX f (x) inf xK h(x) and supxX f (x) supxK h(x).
Remark I have stated this theorem in its natural context, that of general topological spaces. But if these
are unfamiliar to you, you do not in fact need to know what they are. If you read let X be a topological
space as let X be a subset of Rr and K is a compact subset of X as K is a subset of X which is closed
and bounded in R r , you will have enough for all the applications in this chapter. In order to follow the
proof, of course, you will need to know a little about compactness in R r ; I have written out the necessary
facts in 2A2.
proof (a) If K is empty, then we can take f to be the constant function 0. So henceforth let us suppose
that K 6= .
(b) The first point to note is that if f , g A then f g and f g belong to A, where
(f g)(x) = min(f (x), g(x)), (f g)(x) = max(f (x), g(x))
for every x X; this is because
1 1
f g = (f + g |f g|), f g = (f + g + |f g|).
2 2
402 Fourier analysis 281A

It follows by induction on n that f0 . . . fn and f0 . . . fn belong to A for all f0 , . . . , fn A.


(c) If x, y are distinct points of K, and a, b R, there is an f A such that f (x) = a and f (y) = b.
P
P Start from g A such that g(x) 6= g(y); this is the point at which we use the last of the list of four
hypotheses on A. Set
ab bg(x)ag(y)
= , = , f = g + X A. Q
Q
g(x)g(y) g(x)g(y)

(d) (The heart of the proof lies in the next two paragraphs.) Let h : K [0, [ be a continuous function
and x any point of K. For any > 0, there is an f A such that f (x) = h(x) and f (y) h(y) + for
every y K. P P Let Gx be the family of those open sets G X for which there S is some f A such that
f (x) = h(x) and f (w) h(w) + for every w K G. I claim that K Gx . To see this, take any
y K. By (c), there is an f A such that f (x) = h(x) and f (y) = h(y). Now h f K : K R is a
continuous function, taking the value 0 at y, so there is an open subset G of X, containing y, such that
(h Sf K)(w) for every w G K, that is, f (w) h(w) + for every w G K. Thus G Gx and
y Gx , as required.
Because K is compact, Gx has a finite subcover G0 , . . . , Gn say. For each i n, take fi A such that
fi (x) = h(x) and fi (w) h(w) + for every w Gi K. Then
f = f0 f1 . . . fn A,
by (b), and evidently f (x) = h(x), while if y K there is some i n such that y Gi , so that
f (y) fi (y) h(y) + . Q
Q

(e) If h : K R is any continuous function and > 0, there is an f A such that |f (y) h(y)| for
every y K. P P This time, let G be the set of those open subsets G of X for which there is some f A
such that f (y) h(y) + for every y K and f (x) h(x) for every x G K. Once again, G is
an open cover of K. To see this, take any x K. By (d), there is an f A such that f (x) = h(x) and
f (y) h(y) + for every y K. Now h f K : K R is a continuous function which is zero at x, so
there is an open subset G of X, containing x, such that (h S f K)(w) for every w G K, that is,
f (w) h(w) for every w G K. Thus G G and x G, as required.
Because K is compact, G has a finite subcover G0 , . . . , Gm say. For each j m, take fj A such that
fj (y) h(y) + for every y K and fj (w) h(w) for every w Gj K. Then
f = f0 f1 . . . fm A,
by (b), and evidently f (y) h(y) + for every y K, while if x K there is some j m such that x Gj ,
so that
f (x) fj (x) h(x) .
Thus |f (x) h(x)| for every x K, as required. Q
Q
(f ) Thus we have an f satisfying the first of the two requirements of the theorem. But for the second,
set M0 = inf xK h(x) and M1 = supxK h(x), and
f1 = (f M0 X) (M1 X);
f1 satisfies the second condition as well as the first. (I am tacitly assuming here what is in fact the case,
that M0 and M1 are finite; this is because K is compact see 2A2G or 2A3N.)

281B We need some simple tools, belonging to the basic theory of normed spaces; but I hope they will
be accessible even if you have not encountered normed spaces before, if you keep a finger at the beginning
of 2A4 as you read the next lemma.
Lemma Let X be any set. Write ` (X) for the set of bounded functions from X to R. For f ` (X), set
kf k = supxX |f (x)|,
counting the supremum as 0 if X is empty. Then
(a) ` (X) is a normed space.
281C The Stone-Weierstrass theorem 403

(b) Let A ` (X) be a subset and A its closure (2A3D).


(i) If A is a linear subspace of ` (X), so is A.
(ii) If f g A whenever f , g A, then f g A whenever f , g A.
(iii) If |f | A whenever f A, then |f | A whenever f A.
proof (a) This is a routine verification. To confirm that ` (X) is a linear space over R, we have to check
that f + g, cf belong to ` (X) whenever f , g ` (X) and c R; simultaneously we can confirm that k k
is a norm on ` (X) by observing that
|(f + g)(x)| |f (x)| + |g(x)| kf k + kgk ,

|cf (x)| = |c||f (x)| |c|kf k


whenever f , g ` (X) and c R. It is worth noting at the same time that if f , g ` (X), then
|(f g)(x)| = |f (x)||g(x)| kf k kgk
for every x X, so that kf gk kf k kgk .
(Of course all these remarks are very elementary special cases of parts of 243; see 243Xl.)
(b) Recall that
A = {f : f ` (X), > 0 f1 A, kf f1 k }
(2A3Kb). Take f , g A and c R, and let > 0. Set

= min(1, ) > 0.
2+|c|+kf k +kgk

Then there are f1 , g1 A such that kf f1 k , kg g1 k .


Now

k(f + g) (f1 + g1 )k kf f1 k + kg g1 k 2 ,

kcf cf1 k = |c|kf f1 k |c| ,

k(f g) (f1 g1 )k = k(f f1 ) g + f (g g1 ) (f f1 ) (g g1 )k


k(f f1 ) gk + kf (g g1 )k + k(f f1 ) (g g1 )k
kf f1 k kgk + kf k kg g1 )k + kf f1 k kg g1 k
(kgk + kf k + ) (kgk + kf k + 1) ,

k|f | |f1 |k kf f1 k .

(i) If A is a linear subspace, then f1 + g1 and cf1 belong to A. As is arbitrary, f + g and cf belong
to A. As f , g and c are arbitrary, A is a linear subspace of ` (X).
(ii) If A is closed under multiplication, then f1 g1 A. As is arbitrary, f g A.
(iii) If the absolute values of functions in A belong to A, then |f1 | A. As is arbitrary, |f | A.

281C Lemma There is a sequence hpn inN of real polynomials such that limn pn (x) = |x| uniformly
for x [1, 1].
proof (a) By the Binomial Theorem we have
1 1 2 13 3 P (2n)!
(1 x)1/2 = 1 x x x ... = n=0 (2n1)(2n n!)2 x
n
2 42! 23 3!
whenever |x| < 1, with the convergence being uniform on any interval [a, a] with 0 a < 1. (For a proof of
this, see almost any book on real or complex analysis. If you have no favourite text to hand, you can try to
construct a proof from the following facts: (i) the radius of convergence of the series is 1, so on any interval
404 Fourier analysis 281C

[a, a], with 0 a < 1, it is uniformly absolutely summable (ii) writing f (x) for the sum of the series
for |x| < 1, use Lebesgues Dominated Convergence Theorem to find expressions for the indefinite integrals
Rx R0
0
f , x f and show that these are 32 (1 (1 x)f (x)), 23 (1 (1 + x)f (x)) for 0 x < 1 (iii) use the

d f (x)
2
Fundamental Theorem of Calculus to show that f (x) + 2(1 x)f 0 (x) = 0 (iv) show that dx 1x = 0 and
2
hence (v) that f (x) = 1 x whenever |x| < 1. Finally, show that because f is continuous and non-zero in
]1, 1[, f (x) must be the positive square root of 1 x throughout.)
We have a further fragment of information. If we set
1 Pn (2k)!
q0 (x) = 1, q1 (x) = 1 x, qn (x) = k=0 xk
2 (2k1)(2 k!)
k 2

for n 2 and x [0, 1], so that qn is the nth partial sum of the binomial series for (1 x)1/2 , then we have
limn qn (x) = (1 x)1/2 for every x [0, 1[. But also every qn is non-increasing on [0, 1], and hqn (x)inN
is also a non-increasing sequence for each x [0, 1]. So we must have

1 x qn (x) n N, x [0, 1[,
and therefore, because all the qn are continuous,

1 x qn (x) n N, x [0, 1].

Moreover, given > 0, set a = 1 4 , so that 1 a = 2 . Then there is an n0 N such that qn (x)
1 2

1 x 2 for every x [0, a] and n n0 . In particular, qn (a) , so qn (x) and qn (x) 1 x
for every x [a, 1], n n0 . This means that

0 qn (x) 1 x n n0 , x [0, 1];

as is arbitrary, hqn (x)inN 1 x uniformly on [0, 1].
(b) Now set pn (x) = qn (1 x2 ) for x R. Because each qn is a real polynomial of degree n, each pn is a
real polynomial of degree 2n. Next,
p
sup |pn (x) |x|| = sup |qn (1 x2 ) 1 (1 x2 )|
|x|1 |x|1
p
= sup |qn (y) 1 y| 0
y[0,1]

as n , so limn pn (x) = |x| uniformly for |x| 1, as required.

281D Corollary Let X be a set, and A a norm-closed linear subspace of ` (X) containing X and
such that f g A whenever f , g A. Then |f | A for every f A.
proof Set
1
f1 = f,
1+kf k
so that f1 A and kf1 k 1. Because A contains X and is closed under multiplication, p f1 A for
every polynomial p with real coefficients. In particular, gn = pn f1 A for every n, where hpn inN is the
sequence of 281C. Now, because |f1 (x)| 1 for every x X,
kgn |f1 |k = supxX |pn (f1 (x)) |f1 (x)|| sup|y|1 |pn (y) |y|| 0
as n . Because A is k k -closed, |f1 | A; consequently |f | A, as claimed.

281E Stone-Weierstrass theorem: second form Let X be a topological space and K a compact
subset of X. Write Cb (X) for the space of all bounded continuous real-valued functions on X. Let A Cb (X)
be such that
A is a linear subspace of Cb (X);
f g A for every f , g A;
X A;
whenever x, y are distinct points of K there is an f A such that f (x) 6= f (y).
281G The Stone-Weierstrass theorem 405

Then for every continuous h : K R and > 0 there is an f A such that


|f (x) h(x)| for every x K,
if K 6= , inf xX f (x) inf xK h(x) and supxX f (x) supxK h(x).
proof Let A be the k k -closure of A in ` (X). It is helpful to know that A Cb (X); this is because the
uniform limit of continuous functions is continuous. (But if this is new to you, or your memory has faded,
dont take time to look it up now; just read A Cb (X) in place of A in the rest of this argument.) By
281B-281D, A is a linear subspace of Cb (X) and |f | A for every f A, so the conditions of 281A apply
to A.
Take a continuous h : K R and an > 0. The cases in which K = or h is constant are trivial,
because all constant functions belong to A; so I suppose that M0 = inf xK h(x) and M1 = supxK h(x) are
defined and distinct. As observed at the end of the proof of 281A, M0 and M1 are finite. Set
= min( 31 , 21 (M1 M2 )) > 0, h(x) = min(max(h(x), M0 + ), M1 ) for x K,
so that h : K R is continuous and M0 + h(x) M1 for every x K. By 281A, there is an f0 A
such that |f0 (x) h(x)| for every x K and M0 + f0 (x) M1 for every x X. Now there is
an f A such that kf f0 k , so that
|f (x) h(x)| |f (x) f0 (x)| + |f0 (x) h(x)| + |h(x) h(x)| 3
for every x K, while
M0 f0 (x) f (x) f0 (x) + M1
for every x X.

281F Corollary: Weierstrass theorem Let K be any closed bounded subset of R. Then every
continuous h : K R can be uniformly approximated on K by polynomials.
proof Apply 281E with X = K (noting that K, being closed and bounded, is compact), and A the set of
polynomials with real coefficients, regarded as functions from K to R.

281G Stone-Weierstrass theorem: third form Let X be a topological space and K a compact
subset of X. Write Cb (X; C) for the space of all bounded continuous complex-valued functions on X, so
that Cb (X; C) is a linear space over C. Let A Cb (X; C) be such that
A is a linear subspace of Cb (X; C);
f g A for every f , g A;
X A;
the complex conjugate f of f belongs to A for every f A;
whenever x, y are distinct points of K there is an f A such that f (x) 6= f (y).
Then for every continuous h : K C and > 0 there is an f A such that
|f (x) h(x)| for every x K,
if K 6= , supxX |f (x)| supxK |h(x)|.
proof If K = , or h is identically zero, we can take f = 0. So let us suppose that M = supxK |h(x)| > 0.
(a) Set
AR = {f : f A, f (x) is real for every x X}.
Then AR satisfies the conditions of 281E. P P (i) Evidently AR is a subset of Cb (X) = Cb (X; R), is closed
under addition, multiplication by real scalars and pointwise multiplication of functions, and contains X.
If x, y are distinct points of K, there is an f A such that f (x) 6= f (y). Now
1 1
Re f = (f + f), Im f = (f f)
2 2i
both belong to A and are real-valued, so belong to AR , and at least one of them takes different values at x
and y. Q
Q
406 Fourier analysis 281G

(b) Consequently, given a continuous function h : K C and > 0, we may apply 281E twice to find
f1 , f2 AR such that
|f1 (x) Re(h(x))| , |f2 (x) Im(h(x))|
for every x K, where = min( 21 , 6M
M
+4 ) > 0. Setting g = f1 + if2 , we have g A and |g(x) h(x)| 2
for every x K.
(c) Set L = kgk . If L M we can take f = g and stop. Otherwise, consider the function
M
(t) =
max(M, t)

for t [0, L2 ]. By Weierstrass theorem (281F), there is a real polynomial p such that |(t) p(t)| L
whenever 0 t L2 . Note that |g|2 = g g A, so that
f = g p(|g|2 ) A.
Now
M
|p(t)| (t) + (t) + =
L max(M, t) max(M, t)

whenever 0 t L2 , so
M
|f (x)| |g(x)| M
max(M,|g(x)|)

for every x X. Next, if 0 t min(L, M + 2)2 ,


M 4
|1 p(t)| + 1 (t) +1 .
L M M +2 M
Consequently, if x K, so that
|g(x)| min(L, |h(x)| + 2) min(L, M + 2),
we shall have
4
|1 p(|g(x)|2 )| ,
M
and

|f (x) h(x)| |g(x) h(x)| + |g(x)||1 p(|g(x)|2 )|


4 4
2 + (M + 2) 2 + (M + 1) ,
M M
as required.
Remark Of course we could have saved ourselves effort by settling for
supxX |f (x)| 2 supxK |h(x)|,
which would be quite good enough for the applications below.

281H Corollary Let [a, b] R be a non-empty bounded closed interval and h : [a, b] C a continuous
function. Then for any > 0 there are y0 , . . . , yn R and c0 , . . . , cn C such that
Pn
|h(x) k=0 ck eiyk x | for every x [a, b],
Pn
supxR | k=0 ck eiyk x | supx[a,b] |h(x)|.

proof Apply 281G with X = R, K = [a, b] and A the linear span of the functions x 7 eiyx as y runs over
R.

281I Corollary Let S 1 be the unit circle {z : |z| = 1} C. Then for any continuous Pn function h : S1 C
k
and > 0, there are n N and cn , cn+1 , . . . , c0 , . . . , cn C such that |h(z) k=n ck z | for every
z S1.
281N The Stone-Weierstrass theorem 407

proof Apply 281G with X = K = S 1 and A the linear span of the functions z 7 z k for k Z.

281J Corollary Let h : [, ] C be a continuousPfunction such that h() = h(). Then for any
n
> 0 there are n N, cn , . . . , cn C such that |h(x) k=n ck eikx | for every x [, ].

proof The point is that h : S 1 C is continuous on S 1 , where h(z) = h(arg z); this is because arg is
continuous everywhere except at 1, and
limx h(x) = h() = h() = limx h(x),
so
limzS 1 ,z1 h(z) = h() = h(1).
Pn
Now by 281H there are cn , . . . , cn C such that |h(z) k=n ck z k | for every z S 1 , and these
coefficients serve equally for h.

281K Corollary Suppose that r 1 and that K R r is a non-empty closed bounded set. Let
h : K C be a continuous function, and > 0. Then there are y0 , . . . , yn Qr and c0 , . . . , cn C such
that
Pn
|h(x) k=0 ck eiyk . x | for every x K,
Pn
supxRr | k=0 ck eiyk . x | supxK |h(x)|,
Pr
writing y .x = j=1 j j when y = (1 , . . . , r ) and x = (1 , . . . , r ) belong to R r .
proof Apply 281G with X = R r and A the linear span of the functions x 7 eiy . x as y runs over Q r .

281L Corollary Suppose that r 1 and that K R r is a non-empty closed bounded set. Let
h : K R be a continuous
Pn function, and > 0. Then there are y0 , . . . , yn R r and c0 , . . . , cn C such
that, writing g(x) = k=0 ck eiyk . x , g is real-valued and
|h(x) g(x)| for every x K,

inf yK h(y) g(x) supyK h(y) for every x R r .

proof Apply 281E with X = R r and A the set of real-valued functions on R r which are complex linear
combinations of the functions x 7 eiy . x ; as remarked in part (a) of the proof of 281G, A satisfies the
conditions of 281E.

281M Weyls Equidistribution Theorem We are now ready for one of the basic results of number
theory. I shall actually apply it to provide an example in 285 below, but (at least in the one-variable case)
it is surely on the (rather long) list of things which every pure mathematician should know. For the sake of
the application I have in mind, I give the full r-dimensional version, but you may wish to take it in the first
place with r = 1.
It will be helpful to have a notation for fractional part. For any real number x, write <x> for that
number in [0, 1[ such that x <x> is an integer. Now for the theorem.

281N Theorem Let 1 , . . . , r be real numbers such that 1, 1 , . . . , r are linearly independent over Q.
Then whenever 0 j j 1 for each j r,
1 Qr
limn #({m : m n, <mj > [j , j ] for every j r}) = j=1 (j j ).
n+1

Remark Thus the theorem says that the long-term proportion of the r-tuples (<m1 >, . . . , <mr >) which
belong to the interval [a, b] [0, 1] is just the Lebesgue measure [a, b] of the interval. Of course the
condition 1 , . . . , r are linearly independent over Q is necessary as well as sufficient (281Xg).
proof (a) Write y = (1 , . . . , r ) R r ,
408 Fourier analysis 281N

r
<my> = (<m1 >, . . . , <mr >) [0, 1[ = [0, 1[
for each m N. Set I = [0, 1] = [0, 1]r , and for any function f : I R write
1 Pn
L(f ) = lim supn m=0 f (<my>), n+1

1 Pn
L(f ) = lim inf n m=0 f (<my>);
n+1
and for f : I C write
1 Pn
L(f ) = limn m=0 f (<my>)
n+1
if the limit exists. It will be worth noting that for non-negative functions f , g, h : I R such that h f +g,
L(h) L(f ) + L(g),
and that L(cf + g) = cL(f ) + L(g) for any two functions f , g : I C such that L(f ) and L(g) exist, and
any c C.
R
(b) I mean to show that L(f ) exists and is equal to I f for (many) continuous functions f . The key step
is to consider functions of the form
f (x) = e2ik . x ,
where k = (1 , . . . , r ) Zr . In this case, if k 6= 0,
Pr
k . y = j=1 j j
/Z
because 1, 1 , . . . , r are linearly independent over Q. So

n
X n
X
e2ik . <my> = lim
1 1
L(f ) = lim e2imk . y
n n+1 n n+1
m=0 m=0
Pr
(because mk . y k . <my> = j=1 j (mj <mj >) is an integer)

1 e2i(n+1)k . y
= lim
n (n + 1)(1 e2ik . y )

(because e2ik . y 6= 1)
= 0,

because |1 e2i(n+1)k . y | 2 for every n. Of course we can also calculate the integral of f over I, which is

Z Z Z Y
r
2ik . x
f (x)dx = e dx = e2ij j dx
I I I j=1

(writing x = (1 , . . . , r ))
Z 1 Z 1 r
Y
= ... e2ij j dr . . . d1
0 0 j=1
Z 1 Z 1
= e2ir r dr . . . e2i1 1 d1 = 0
0 0

because at least one j is non-zero, and for this j we must have


R1 1
0
e2ij j dj = (e2ij 1) = 0.
2ij
R
So we have L(f ) = I
f = 0 when k 6= 0. On the other hand, if k = 0, then f is constant with value 1, so
1 Pn R
L(f ) = limn m=0 f (<my>) = limn 1 = 1 = I f (x)dx.
n+1
281N The Stone-Weierstrass theorem 409

(c) Now write


R I = [0, 1] \ ]0, 1[, the boundary of I. If f : I C is continuous and f (x) = 0 for x I,
P As in 281I, let S 1 be the unit circle {z : z C, |z| = 1}, and set K = (S 1 )r Cr . If
then L(f ) = I f . P
we think of K as a subset of R 2r , it is closed and bounded. Let : K I be given by
1 arg 1 1 arg r
(1 , . . . , r ) = ( + ,... , + )
2 2 2 2

for 1 , . . . , r S 1 . Then h = f : K C is continuous, because is continuous on (S 1 \ {1})r and


limwz f (w) = f (z) = 0
1 r
for any z K \ (S \ {1}) . (Compare 281J.) Now apply 281G with X = K and A the set of polynomials
in 1 , . . . , r , 11 , . . . , r1 to see that, given > 0, there is a function of the form
P
g(z) = kJ ck 11 . . . rr ,
for some finite set J Zr and constants ck C for k J, such that
|g(z) h(z)| for every z K.
Set
P P
g(x) = g(ei(21 1) , . . . , ei(2r 1) ) = kJ ck eik . (2x1) = kJ (1)
k.1
ck e2ik . x ,
so that g = g, and see that
supxI |g(x) f (x)| = supzK |g(z) h(z)| .
R
Now g is of the form dealt with in (a), so we must have L(g) = I g. Let n0 be such that
R Pn
g 1
I m=0
g(<my>)
n+1
for every n n0 . Then
R R R
| I
f I
g| I
|f g|
and
n
X n
X n
X
1 1 1
| g(<my>) f (<my>)| |g(<my>) f (<my>)|
n+1 n+1 n+1
m=0 m=0 m=0
1
(n + 1) =
n+1

for every n N. So for n n0 we must have


1 Pn R
| m=0 f (<my>) I f | 3.
n+1
R
As is arbitrary, L(f ) = I f , as required. Q
Q
r
(d) Observe next that if a, b ]0, 1[ = ]0, 1[ , and > 0, there are continuous functions f1 , f2 such that
R R
f1 [a, b] f2 ]0, 1[, f I f1 .
I 2
P This is elementary. For n N, define hn : R [0, 1] by setting hn () = 0 if 0, 2n if 0 2n
P
and 1 if 2n . Set
Qr
f1n (x) = j=1 hn (j j )hn (j j ),
Qr
f2n (x) = j=1 (1 hn (j j ))(1 hn (j j ))
for x = (1 , . . . , r ) R r . (Compare the proof of 242O.) Then f1n [a, b] f2n for each n, f2n ]0, 1[
for all n so large that
2n min(minjr j , minjr (1 j )),
and limn f2n (x) f1n (x) = 0 for every x, so
R R
limn I
f2n I
f1n = 0.
410 Fourier analysis 281N

Thus we can take f1 = f1n , f2 = f2n for any n large enough. Q


Q
(e) It follows that if a, b ]0, 1[ and a b, L([a, b]) = [a, b]. P
P Let > 0. Take f1 , f2 as in (d). Then,
using (c),
R R
L([a, b]) L(f2 ) = L(f2 ) = I f2 I f1 + [a, b] + ,
R R
L([a, b]) L(f1 ) = L(f1 ) = I f1 I f2 [a, b] ,
so
[a, b] L([a, b]) L([a, b]) [a, b] + .
As is arbitrary,
[a, b] = L([a, b]) = L([a, b]) = L([a, b]),
as required. Q
Q
(f ) To complete the proof, take any a, b I with a b. For 0 12 , set I = [1, (1 )1], so that I
is a closed interval included in ]0, 1[ and I = (1 2)r . Of course L(I) = I = 1, so
L((I \ I )) = L(I) L(I ) = 1 I ,
and

[a, b] 1 + I [a, b] + I ([a, b] I ) = ([a, b] I )


= L(([a, b] I )) L(([a, b]))
L(([a, b])) L(([a, b] I )) + L((I \ I ))
= L(([a, b] I )) + 1 I
= ([a, b] I ) + 1 I [a, b] + 1 I .
As is arbitrary,
[a, b] = L([a, b]) = L([a, b]) = L([a, b]),
as stated.

281X Basic exercises (a) Let A be the Pnset of those bounded continuous functions f : R r R r R
which are expressible in the form f (x, y) = k=0 gk (x)gk (y), where all the gk , gk0 are continuous functions
0

from R r to R. Show that for any bounded continuous function h : R r R r R and any bounded set
K R r R r and any > 0, there is an f A such that |f (x, y) h(x, y)| for every (x, y) K and
supx,yR r |f (x, y)| supx,yR r |h(x, y)|.
(b) Let K be a closed bounded set in R r , where r 1, and h : K R a continuous function. Show that
for any > 0 there is a polynomial p in r variables such that |h(x) p(x)| for every x K.
> (c) Let [a, b] be a non-empty closed interval of R and h : [a, b] R a continuous function. Show that
for any > 0 there are y0 , . . . , yn , a0 , . . . , an , b0 , . . . , bn R such that
Pn
|h(x) k=0 (ak cos yk x + bk sin yk x)| for every x [a, b],
Pn
supxR | k=0 (ak cos yk x + bk sin yk x)| supx[a,b] |h(x)|.

(d) Let h be a complex-valued function on ], ] such that P |h|p is integrable, where 1 p < . Show
n
that for every > 0 there is a function of the form x 7 f (x) = k=n ck eikx , where ck , . . . , ck C, such
R p
that |h f | . (Compare 244H.)
> (e) Let h : [, ] R be a continuous function such that h() = h(), and > 0. Show that there
are a0 , . . . , an , b1 , . . . , bn R such that
1 Pn
|h(x) a0 k=1 (ak cos kx + bk sin kx)|
2
for every x [, ].
281Yf The Stone-Weierstrass theorem 411

(f ) Let K be a non-empty closed bounded set in R r , where r 1, and h : K R a continuous function.


Show that for any > 0 there are y0 , . . . , yn R r , a0 , . . . , an , b0 , . . . , bn R such that
Pn
|h(x) k=0 (ak cos yk . x + bk sin yk . x)| for every x K,
Pn
supxR | k=0 (ak cos yk .x + bk sin yk . x)| supxK |h(x)|,
interpreting y . x as in 281K.

(g) Let y1 , . . . , yr be real numbers which are not linearly independent over Q. Show that there is a
non-trivial interval [a, b] [0, 1] R r such that (<ky1 >, . . . , <kyr >)
/ [a, b] for every k Z.

(h) Let 1 , . . . , r be real numbers such that 1, 1 , . . . , r are linearly independent over Q. Suppose that
0 j j 1 for each j r. Show that for every > 0 there is an n0 N such that
Qr 1
| j=1 (j j ) #({m : k m k + n, <mj > [j , j ] for every j r})|
n+1

whenever n n0 and k N. (Hint: in 281N, set


1 Pk+n
L(f ) = lim supn supkN m=k f (<my>).)
n+1

281Y Further exercises (a) Show that under the hypotheses of 281A, there is an f A, the k k -
closure of A in Cb (X), such that f K = h. (Hint: take f = limn fn where
kfn+1 fn k supxK |fn (x) h(x)| 2n
for every n N.)

(b) Let X be a topological space and K X a compact subset. Suppose that for any distinct points x,
y of K there is a continuous function f : X R such that f (x) 6= f (y). Show that for any r N and any
continuous h : K R r there is a continuous f : X R r extending h. (Hint: consider r = 1 first.)

(c) Let hXi iiI be any family of compact Hausdorff spaces, and X their product as topological spaces.
For each i, write C(Xi ) for the set of continuous functions from Xi to R, and i : X Xi for the coordinate
map. Show that the subalgebra of C(X) generated by {f i : i I, f C(Xi )} is k k -dense in C(X).
(Note: you will need to know that X is compact, and that if Z is any compact Hausdorff space then for any
distinct z, w Z there is an f C(Z) such that f (z) 6= f (w). For references see 3A3J and 3A3Bc in the
next volume.)

(d) Let X be a topological space and K a compact subset of X. Let A be a linear subspace of the space
Cb (X) of real-valued continuous functions on X such that |f | A for every f A. Let h : K R be a
continuous function such that whenever x, y K there is an f A such that f (x) = h(x) and f (y) = h(y).
Show that for every > 0 there is an f A such that |f (x) h(x)| for every x K.

(e) Let X be a compact topological space and write C(X) for the set of continuous functions from X to
R. Suppose that h C(X), and let A C(X) be such that
A is a linear subspace of C(X);
either |f | A for every f A or f g A for every f , g A or f f A for every f A;
whenever x, y X and > 0 there is an f A such that |f (x) h(x)| , |f (y) h(y)| .
Show that for every > 0 there is an f A such that |h(x) f (x)| for every x X.

(f ) Let X be a compact topological space and A a k k -closed linear subspace of the space C(X) of
continuous functions from X to R. Show that the following are equiveridical:
(i) |f | A for every f A;
(ii) f f A for every f A;
(iii) f g A for all f , g A,
and that in this case A is closed in C(X) for the topology defined by the pseudometrics
412 Fourier analysis 281Yf

(f, g) 7 |f (x) g(x)| : C(X) C(X) [0, [


as x runs over X (the topology of pointwise convergence on C(X)).

(g) Show that under the hypotheses of 281G there is an f A, the k k -closure of A in Cb (X; C), such
that f K = h and (if K 6= ) kf k = supxK |h(x)|.

(h) Let y R be irrational. Show that for any Riemann integrable function f : [0, 1] R,
R1 1 Pn
0
f (x)dx = limn m=0 f (<my>),
n+1

writing <my> for the fractional part of my. (Hint: recall Riemanns criterion: for any > 0, there are
a0 , . . . , an with 0 = a0 a1 . . . an = 1 and
P
{aj aj1 : j n, supx[aj1 ,aj ] f (x) inf x[aj1 ,aj ] f (x) } .)

1
Pn
(i) Let htn inN be a sequence in [0, 1]. Show that the following are equiveridical: (i) limn n+1 k=0 f (tk )
R1 1
Pn R 1
= 0 f for every continuous function f : [0, 1] R; (ii) limn n+1 k=0 f (t k ) = 0
f for every Riemann
1
integrable function f : [0, 1] R; (iii) lim inf n n+1 #({k : k n, tk G}) G for every open set
1 1
G [0, 1]; (iv) limn n+1 #({k : k n, tk }) = for every [0, 1]; (v) limn n+1 #({k : k
1
P n 2imtk
n, tk E}) = E for every E [0, 1] such that (int E) = E (vi) limn n+1 k=0 e = 0 for every
m 1. (Cf. 273J. Such sequences htn inN are called equidistributed or uniformly distributed.)

(j) Show that the sequence h< ln(n + 1)>inN is not equidistributed.

(k) Give [0, 1]N its product measure . Show that -almost every sequence htn inN [0, 1]N is equidis-
tributed in the sense of 281Yi. (Hint: 273J.)
R
(l) Let f : [0, 1]2 C be a continuous function. Show that if R is irrational then [0,1]2 f =
R a
lima a1 0 f (<t>, <t>)dt. (Hint: consider first functions of the form x 7 e2ik . x .)

281 Notes and comments I have given three statements (281A, 281E and 281G) of the Stone-Weierstrass
theorem, with an acknowledgement (281F) of Weierstrass own version, and three further forms (281Ya,
281Yd, 281Yg) in the exercises. Yet another will appear in 4A6 in Volume 4. Faced with such a multiplicity,
you may wish to try your own hand at writing out theorems which will cover some or all of these versions. I
myself see no way of doing it without setting up a confusing list of alternative hypotheses and conclusions.
At which point, I ask what is a theorem, anyway?, and answer, it is a stopping-place on our journey; it is
a place where we can rest, and congratulate ourselves on our achievement; it is a place which we can learn
to recognise, and use as a starting point for new adventures; it is a place we can describe, and share with
others. For some theorems, like Fermats last theorem, there is a canonical statement, an exactly locatable
point. For others, like the Stone-Weierstrass theorem here, we reach a mass of closely related results, all
depending on some arrangement of the arguments laid out in 281A-281G and 281Ya (which introduces a
new idea), and all useful in different ways. I suppose, indeed, that most authors would prefer the versions
281Ya and 281Yg, which eliminate the variable which appears in 281A, 281E and 281G, at the expense of
taking a closed subspace A. But I find that the corollaries which will be useful later (281H-281L) are more
naturally expressed in terms of linear subspaces which are not closed.
The applications of the theorem, or the theorems, or the method choose your own expression are
legion; only a few of them are here. An apparently innocent one is in 281Xa and, in a different variant, in
281Yc; these are enormously important in their own domains. In this volume the principal application will
be to 285L below, depending on 281K, and it is perhaps right to note that there is an alternative approach
to this particular result, based on ideas in 282G. But I offer Weyls equidistribution theorem (281M-281N)
as evidence that we can expect to find good use for these ideas in almost any branch of mathematics.
282Ba Fourier series 413

282 Fourier series


Out of the enormous theory of Fourier series, I extract a few results which may at least provide a
foundation for further study. I give the definitions of Fourier and Fejer sums (282A), with five of the most
important results concerning their convergence (282G, 282H, 282J, 282L, 282O). On the way I include the
Riemann-Lebesgue lemma (282E). I end by mentioning convolutions (282Q).

282A Definition Let f be an integrable complex-valued function defined almost everywhere on ], ].

(a) The Fourier coefficients of f are the complex numbers


Z
1
ck = f (x)eikx dx
2

for k Z.

(b) The Fourier sums of f are the functions


n
X
sn (x) = ck eikx
k=n

for x ], ], n N.
P
(c) The Fourier series of f is the series k= ck eikx , or (because we ordinarily consider the symmetric
P
partial sums sn ) the series c0 + k=1 (ck eikx + ck eikx ).

(d) The Fejer sums of f are the functions


m
X
1
m (x) = sn (x)
m+1
n=0

for x ], ], m N.

(e) It will be convenient to have a further phrase available. If f is any function with dom f ], ],
S
its periodic extension is the function f, with domain kZ (dom f + 2k), such that f(x) = f (x 2k)
whenever k Z, x dom f + 2k.

282B Remarks I have made two more or less arbitrary choices here.

(a) I have chosen to express Fourier series in their complex form rather than their real form. From the
point of view of pure measure theory (and, indeed, from the point of view of the nineteenth-century origins
of the subject) there are gains in elegance from directing attention to real functions f and looking at the
real coefficients
Z
1
ak = f (x) cos kx dx for k N,

Z
1
bk = f (x) sin kx dx for k 1.

If we do this we have
1
c0 = a0 ,
2
and for k 1 we have
1 1
ck = (ak ibk ), ck = (ak + ibk ), ak = ck + ck , bk = i(ck ck ),
2 2
414 Fourier analysis 282Ba

so that the Fourier sums become


n
X
1
sn (x) = a0 + ak cos kx + bk sin kx.
2
k=1

The advantage of this is that real functions f correspond to real coefficients ak , bk , so that it is obvious that
if f is real-valued so are its Fourier and Fejer sums. The disadvantages are that we have to use a variety of
trigonometric equalities which are rather more complicated than the properties of the complex exponential
function which they reflect, and that we are farther away from the natural generalizations to locally compact
abelian groups. So both electrical engineers and harmonic analysts tend to prefer the coefficients ck .

(b) I have taken the functions f to be defined on the interval ], ] rather than on the circle S 1 = {z :
z C, |z| = 1}. There would be advantages in elegance of language in using S 1 , though I do not recall often
seeing the formula
R
ck = z k f (z)dz
1
R ikx
which is the natural translation of ck = 2 e f (x)dx under the substitution x = arg z, dx = 2(dz).
However, applications of the theory tend to deal with periodic functions on the real line, so I work with
], ], and accept the fact that its group operation +2 , writing x +2 y for whichever of x + y, x + y + 2,
x + y 2 belongs to ], ], is less familiar than multiplication on S 1 .

(c) The remarks in (b) are supposed to remind you of 255.

(d) Observe that if f =a.e. g then f and g have the same Fourier coefficients, Fourier sums and Fejer
sums. This means that we could, if we wished, regard the ck , sn and m as associated with a member of L1C ,
the space of equivalence classes of integrable functions (242), rather than as associated with a particular
function f . Since however the sn and m appear as actual functions, and since many of the questions we
are interested in refer to their values at particular points, it is more natural to express the theory in terms
of integrable functions f rather than in terms of members of L1C .

282C The problems (a) Under what conditions, and in what senses, do the Fourier and Fejer sums
sn and m of a function f converge to f ?

(b) How do the properties of the double-ended sequence hck ikZ reflect the properties of f , and vice
versa?
Remark The theory of Fourier series has been one of the leading topics of analysis for nearly two hundred
years, and innumerable further problems have contributed greatly to our understanding. (For instance: can
one characterize those sequences hck ikZ which are the Fourier coefficients of some integrable function?) But
in this outline I will concentrate on the question (a) above, with one result (282K) addressing (b), which
will give us more than enough material to work on.
While most people would feel that the Fourier sums are somehow closer to what we really want to know,
it turns out that the Fejer sums are easier to analyse, and there are advantages in dealing with them first.
So while you may wish to look ahead to the statements of 282J, 282L and 282O for an idea of where we
are going, the first half of this section will be largely about Fejer sums. Note that in any case in which we
know that the Fourier sums converge (which is quite common; see, for instance, the examples in 282Xh and
282Xo), then if we know that the Fejer sums converge to f , we can deduce that the Fourier sums also do,
by 273Ca.
The first step is a basic lemma showing that both the Fourier and Fejer sums of a function f can be
thought of as convolutions of f with kernels describable by familiar functions.

282D Lemma Let f be a complex-valued function which is integrable over ], ], and


1
R Pn 1
Pm
ck = 2
f (x)eikx dx, sn (x) = k=n ck eikx , m (x) = m+1 n=0 sn (x)

its Fourier coefficients, Fourier sums and Fejer sums. Write f for the periodic extension of f (282Ae). For
m N, write
282D Fourier series 415

1cos(m+1)t
m (t) =
2(m+1)(1cos t)

m+1
for 0 < |t| . (If you like, you can set m (0) = 2 to make m continuous on [, ].)
(a) For each n N, x ], ],
Z
1 sin(n+ 12 )(xt)
sn (x) = f (t) dt
2
sin 21 (xt)
Z 1
1 sin(n+ 2 )t
= f(x + t) 1 dt
2
sin 2 t
Z
1 sin(n+ 12 )t
= f (x 2 t) dt,
2
sin 12 t

writing x 2 t for whichever of x t, x t 2, x t + 2 belongs to ], ].


(b) For each m N, x ], ],
Z
m (x) = f(x + t)m (t)dt

Z
= (f(x + t) + f(x t))m (t)dt
0
Z
= f (x 2 t)m (t)dt.

(c) For any n N,


Z 0 Z Z
1 sin(n+ 12 )t 1 sin(n+ 12 )t 1 1 sin(n+ 12 )t
dt = dt = , dt = 1.
2
sin 12 t 2 0
sin 12 t 2 2
sin 12 t

(d) For any m N,


m+1
(i) 0 m (t) for every t;
2
(ii) for any > 0, limm m (t) = 0 uniformly on {t : |t| };
R0 R 1 R
(iii) m (t)dt = 0 m (t)dt = , m
(t)dt = 1.
2

proof Really all that these amount to is summing geometric series.


(a) For (a), we have
n
X eint ei(n+1)t
eikt =
1 eit
k=n
1 1
ei(n+ 2 )t ei(n+ 2 )t sin(n + 21 )t
= = .
sin 12 t
1
e 2 it e 12 it

So
n
X n
X Z
1
sn (x) = ck eikx = eikx f (s)eiks ds
2
k=n k=n
Z n Z
1
X 1
1
sin(n+ 2 )(xs)
=
f (s) e ik(xs)
ds = f(s) 1 dt
2
2
sin 2 (xs)
k=n
Z x Z
1 sin(n+ 12 )t 1 sin(n+ 2 )t 1
= f(x + t) 1 dt = f(x + t) 1 dt
2 x
sin 2 t 2
sin 2 t
416 Fourier analysis 282D

sin(n+ 1 )t
because f and t 7 sin 1 t2 are periodic with period 2, so that the integral from x to must be
2
the same as the integral from x to .
For the expression in terms of f (x 2 t), we have

Z Z
1 sin(n+ 12 )t 1
1
sin(n+ 2 )(t)
sn (x) = f(x + t) 1 dt = f(x t) 1 dt
2
sin 2 t 2
sin 2 (t)

(substituting t for t)
Z
1 sin(n+ 12 )t
= f (x 2 t) dt
2
sin 12 t

because (for x, t ], ]) f (x 2 t) = f(x t) whenever either is defined, and sin is an odd function.
(b) In the same way, we have
m
X m m
1 X 1 1 X
sin(n + )t = Im ei(n+ 2 )t = Im e 2 it eint
2
n=0 n=0 n=0
1 ei(m+1)t
1 1 ei(m+1)t
= Im e 2 it = Im 1 1
1 eit e 2 it e 2 it
1 ei(m+1)t i(1 ei(m+1)t )
= Im 1 = Im
2i sin 2 t 2 sin 21 t
1 cos(m + 1)t
= .
2 sin 12 t
So
Pm sin(n+ 12 )t 1cos(m+1)t 1cos(m+1)t
n=0 = = = 2(m + 1)m (t).
sin 12 t 2 sin2 (n+ 12 )t 1cos t

Accordingly,

m
X
1
m (x) = sn (x)
m+1
n=0
Xm Z 1
1 1 sin(n+ 2 )t
= f(x + t) 1 dt
m+1 2
sin 2 t
n=0
Z m
X
1 1 sin(n+ 21 )t
= f(x + t) dt
2
m+1 sin 12 t
n=0
Z Z
= f(x + t)m (t)dt = f (x 2 t)m (t)dt

as in (a), because cos and m are even functions. For the same reason,
Z Z 0
f(x t)m (t)dt = f(x + t)m (t)dt,
0
so
Z
m (x) = (f(x + t) + f(x t))m (t)dt.
0

sin(n+ 21 )t
(c) We need only look at where the formula sin 12 t
came from to see that
282E Fourier series 417

Z Z X
n
1 sin(n+ 12 )t 1
dt = eikt dt
2 I
sin 12 t 2 I k=n
Z n
X
1 1
= (1 + 2 cos kt)dt =
2 I
2
k=1
R
for both I = [, 0] and I = [0, ], because I
cos kt dt = 0 for every k 6= 0.
(d)(i) m (t) 0 for every t because 1 cos(m + 1)t, 1 cos t are always greater than or equal to 0. For
the upper bound, we have, using the constructions in (a) and (b),
n
sin(n+ 1 )t X
1
2 = eikt 2n + 1
sin 2 t
k=n

for every n, so
m
X
1 sin(n+ 12 )t
m (t) =
2(m+1) sin 12 t
n=0
Xm
1 m+1
2n + 1 = .
2(m+1) 2
n=0

(ii) If |t| ,
1 1
m (t) 0
(m+1)(1cos t) (m+1)(1cos )
as m .
(iii) also follows from the construction in (b), because
Z m Z
X m
X
1 sin(n+ 21 )t 1 1 1
m (t)dt = dt = =
I
2(m+1) I
sin 12 t m+1 2 2
n=0 n=0

for both I = [, 0] and I = [0, ], using (c).


Remarks For a discussion of substitution in integrals, if you feel any need to justify the manipulations in
part (a) of the proof, see 263I.
The functions
sin(n+ 12 )t 1cos(m+1)t
t 7 , t 7
sin 12 t (m+1)(1cos t)

are called respectively Dirichlets kernel and Fejers kernel.


I give the formulae in terms of f (x 2 t) in (a) and (b) in order to provide a link with the work of 255O.

282E The next step is a vital lemma, with a suitably distinguished name which (you will be glad to
know) reflects its importance rather than its difficulty.
The Riemann-Lebesgue lemma Let f be a complex-valued function which is integrable over R. Then
R R
limy f (x)eiyx dx = limy f (x)eiyx dx = 0.

proof (a) Consider first the case in which f = ]a, b[, where a < b. Then
R Rb 1 2
| f (x)eiyx dx| = | a eiyx dx| = | (eiyb eiya )|
iy |y|
if y 6= 0. So in this case the result is obvious.
(b) It follows at once that the result is true if f is a step-function with bounded support, that is, if there
are a0 a1 . . . an such that f is constant on every interval ]aj1 , aj [ and zero outside [a0 , an ].
418 Fourier analysis 282E
R
(c) Now, for a given integrable f and > 0, there is a step-function g such that |f g| (242O). So
R R R
| f (x)eiyx dx g(x)eiyx dx| |f (x) g(x)|dx
for every y, and
R
lim supy | f (x)eiyx dx| ,
R
lim supy | f (x)eiyx dx| .
As is arbitrary, we have the result.

282F Corollary (a) Let f be a complex-valued function which is integrable over ], ], and hck ikZ
its sequence of Fourier coefficients. Then limk ck = limk ck = 0. R
(b) Let f be a complex-valued function which is integrable over R. Then limy f (x) sin yx dx = 0.
proof (a) We need only identify
Z
1
ck = f (x)eikx dx
2
R
with g(x)eikx dx, where g(x) = f (x)/2 for x dom f and 0 for |x| > .
(b) This is just because
R 1 R R
f (x) sin yx dx = ( f (x)eiyx dx f (x)eiyx dx).
2i

282G We are now ready for theorems on the convergence of Fejer sums. I start with an easy one,
almost a warming-up exercise.
Theorem Let f : ], ] C be a continuous function such that limt f (t) = f (). Then its sequence
hm imN of Fejer sums converges uniformly to f on ], ].
proof The conditions on f amount just to saying that its periodic extension f is defined and continu-
ous everywhere on R. Consequently it is bounded and uniformly continuous on any bounded interval, in
particular, on the interval [2, 2]. Set K = sup|t|2 |f(t)| = supt],] |f (t)|. Write
1cos(m+1)t
m (t) =
2(m+1)(1cos t)

for m N, 0 < |t| , as in 282D.


Given > 0 we can find a ]0, ] such that |f(x + t) f(x)| whenever x [, ], |t| . Next,

we can find an m0 N such that Mm 4K for every m m0 , where Mm = sup|t| m (t) (282D(d-ii)).
Now suppose that m m0 and x ], ]. Set g(t) = f(x + t) f (x) for |t| . Then |g(t)| 2K for all
t [, ] and |g(t)| if |t| , so
Z Z Z Z

g(t)m (t)dt |g(t)|m (t)dt + |g(t)|m (t)dt + |g(t)|m (t)dt

Z
2Mm K( ) + m (t)dt + 2Mm K( )

4Mm K + 2.
Consequently, using 282Db and 282D(d-iii),
Z
|m (x) f (x)| = | (f(x + t) f (x))m (t)dt| 2

for every m m0 ; and this is true for every x ], ]. As is arbitrary, hm imN converges to f uniformly
on ], ].
282H Fourier series 419

282H I come now to a theorem describing the behaviour of the Fejer sums of general functions f . The
hypothesis of the theorem may take a little bit of digesting; you can get an idea of its intended scope by
glancing at Corollary 282I.
Theorem Let f be a complex-valued function which is integrable over ], ], and hm imN its sequence
of Fejer sums. Suppose that x ], ], c C are such that
Z
1
lim |f(x + t) + f(x t) 2c|dt = 0,
0 0

writing f for the periodic extension of f , as usual; then limm m (x) = c.

proof Set (t) = |f(x + t) + f(x t) 2c| when this is defined, which is almost everywhere, and (t) =
Rt
0
(s)ds, which is defined for every t 0, because f is integrable over ], ] and therefore over every
bounded interval.
As in 282D, set
1cos(m+1)t
m (t) =
2(m+1)(1cos t)

for m N, 0 < |t| . We have


Z Z
|m (x) c| = | (f(x + t) + f(x t) 2c)m (t)dt| (t)m (t)dt
0 0

by (b) and (d) of 282D.


Let > 0. By hypothesis, limt0R(t)/t = 0; let ]0, ] be such that (t) t for every t [0, ]. Take

any m /. I break the integral 0 (t)m (t)dt up into three parts.
(i) For the integral from 0 to 1/m, we have
Z 1/m Z 1/m
m+1 m+1 1 (m+1)
(t)m (t)dt (t)dt = ( ) ,
0 0
2 2 m 2m

m+1
because m (t) 2 for every t (282D(d-i)).
(ii) For the integral from 1/m to , we have

Z Z Z
1 1 (t)
(t)m (t)dt (t) dt dt
1/m 2(m + 1) 1/m 1 cos t 4(m + 1) 1/m t2
2t2
(because 1 cos t for |t| )
2
1 Z
() ( m ) 2(t)
= 2
1 2 + dt
4(m + 1) (m) 1/m t3
(integrating by parts see 225F)
Z
2
+ dt
4(m + 1) 1/m t2
(because (t) t for 0 t )

+ 2m
4(m + 1)

+ + 2.
4(m + 1) 2 4 2

(iii) For the integral from to , we have


420 Fourier analysis 282H

Z Z
1cos(m+1)t
(t)m (t) = (t) dt

2(m+1)(1cos t)
Z
(t)
dt 0 as m

(m+1)(1cos )

because is integrable over [, ]. There must therefore be an m0 N such that


Z
(t)m (t)dt

for every m m0 .
Putting these together, we see that
Z
(t)m (t)dt + 2 + = 4
0

for every m max(m0 , ). As is arbitrary, limm m (x) = c, as claimed.

282I Corollary Let f be a complex-valued function which is integrable over ], ], and hm imN its
sequence of Fejer sums.
(a) f (x) = lim
R m m (x) for almost every x ], ].
(b) limm |f (x) m (x)|dx = 0.
(c) If g is another integrable function with the same Fourier coefficients, then f =a.e. g.
(d) If x ], [ is such that a = limtdom f,tx f (t) and b = limtdom f,tx f (t) are both defined in C,
then
1
limm m (x) = (a + b).
2

(e) If a = limtdom f,t f (t) and b = limtdom f,t f (t) are both defined in C, then
1
limm m () = (a + b).
2

(f) If f is defined and continuous at x ], [, then


limm m (x) = f (x).
(g) If f, the periodic extension of f , is defined and continuous at , then
limm m () = f ().

proof (a) We have only to recall that by 223D


Z
1
lim sup |f (x + t) + f (x t) 2f (x)|dt
0 0
Z Z
1
lim sup |f (x + t) f (x)|dt + |f (x t) f (x)|dt
0 0 0
Z
1
= lim sup |f (x + t) f (x)|dt = 0
0

for almost every x ], [.


(b) Next observe that, in the language of 255O,
m = f m ,
by the last formula in 282Db. Consequently, by 255Oe,
km k1 kf k1 km k1 ,
282J Fourier series 421
R
writing km k1 =
|m (x)|dx. But this means that we have
f (x) = limm m (x) for almost every x, lim supm km k1 kf k1 ;
and it follows from 245H that limm kf m k1 = 0.
(c) If g has the same Fourier coefficients as f , then it has the same Fourier and Fejer sums, so we have
g(x) = limm m (x) = f (x)
almost everywhere.
(d)-(e) Both of these amount to considering x ], ] such that
limtdom f,tx f(t) = a, limtdom f,tx f(t) = b.
Setting c = 12 (a + b), (t) = |f(x + t) + f(x t) 2c| whenever this is defined, we have limtdom ,t0 (t) = 0,
R
so surely lim0 1 0 = 0, and the theorem applies.
(f )-(g) are special cases of (d) and (e).

282J I now turn to conditions for the convergence of Fourier sums. Probably the easiest result one
which is both striking and satisfying is the following.
Theorem Let f be a complex-valued function which is square-integrable over ], ]. Let hck ikZ be its
Fourier coefficients and hsn inN its Fourier sums (282A). Then
P 1 R
(i) k= |ck |2 = |f (x)|2 dx,
2
R
(ii) limn |f (x) sn (x)|2 dx = 0.
proof (a) I recall some notation from 244N. Let L2C be the space of square-integrable complex-valued
functions on ], ]. For g, h L2C , write
Z p
(g|h) = g(x)h(x)dx, kgk2 = (g|g).

Recall that kg + hk2 kgk2 + khk2 for all g, h L2C (244Fb). For k Z, x ], ] set ek (x) = eikx , so
that
Z
(f |ek ) = f (x)eikx dx = 2ck .

Moreover, if |k| n,
n
X Z
(sn |ek ) = cj eijx eikx dx = 2ck ,
j=n

because
Z
eijx eikx = 2 if j = k,

= 0 if j 6= k.
So
(f sn |ek ) = 0 whenever |k| n;
in particular,
n
X
(f sn |sn ) = ck (f sn |ek ) = 0
k=n

for every n N.
422 Fourier analysis 282J

(b) Fix > 0. The next element Pm of the proof is the fact that there are m N, am , . . . , am C such
that kf hk2 , where h = k=m ak ek . P P By 244Hb we know that there is a continuous function
g : [, ] C such that kf gk2 3 . Next, modifying g on a suitably short interval ] , ], we
can find a continuous function g1 : [, ] C such that kg g1 k2 3 and g1 () = g1 (). (Set M =
supx[,] |g(x)|, take ]0, 2] such that (2M )2 /2 (/3)2 , and set g1 (t) = tg()+(1t)g()
for t [0, 1].) Either by the Stone-Weierstrass theorem (281J), or by 282G above, there are am , . . . , am such
Pm Pm
that |g1 (x) k=m ak eikx | for every x [, ]; setting h = k=m ak ek , we have kg1 hk2 31 ,
3 2
so that
kf hk2 kf gk2 + kg g1 k2 + kg1 hk2 . Q
Q

(c) Now take any n m. Then sn h is a linear combination of en , . . . , en , so (f sn |sn h) = 0.


Consequently

2 (f h|f h)
= (f sn |f sn ) + (f sn |sn h) + (sn h|f sn ) + (sn h|sn h)
= kf sn k22 + ksn hk22 kf sn k22 .
Thus kf sn k2 for every n m. As is arbitrary, limn kf sn k22 = 0, which proves (ii).
(d) As for (i), we have
n
X n
X
1 1 1
|ck |2 = ck (sn |ek ) = (sn |sn ) = ksn k22 .
2 2 2
k=n k=n

But of course

ksn k2 kf k2 ksn f k2 0
as n , so

X Z
1 1 1
|ck |2 = lim ksn k22 = kf k22 = |f (x)|2 dx,
2 n 2 2
k=

as required.

282K Corollary Let L2C be the Hilbert space of equivalence classes of square-integrable complex-valued
functions on ], ], with the inner product
Z
(f |g ) =

f (x)g(x)dx

and norm
Z
1/2
kf k2 =

|f (x)|2 dx ,

writing f L2C for the equivalence class of a square-integrable function f . Let `2C (Z) be the Hilbert space
of square-summable double-ended complex sequences, with the inner product

X
(cc|dd) = ck dk
k=

and norm

X 1/2
kcck2 = |ck |2
k=
282K Fourier series 423

for c = hck ikZ , d = hdk ikZ in `2C (Z). Then we have an inner-product-space isomorphism S : L2C `2C (Z)
defined by saying that
Z
1
S(f )(k) =

f (x)eikx dx
2

for every square-integrable function f and every k Z.


proof (a) As in 282J, write L2C for the space of square-integrable functions. If f , g L2C and f = g , then
f =a.e. g, so
Z Z
1 1
f (x)eikx dx = g(x)eikx dx
2
2

for every k N. Thus S is well-defined.


P This is elementary. If f , g L2C and c C,
(b) S is linear. P
Z
1
S(f + g )(k) = (f (x) + g(x))eikx dx
2
Z Z
1 1
= f (x)eikx dx + g(x)eikx dx
2
2
= S(f )(k) + S(g )(k)
for every k Z, so that S(f + g ) = S(f ) + S(g ). Similarly,
Z Z
1 c
S(cf )(k) = cf (x)eikx dx = f (x)eikx dx = cS(f )(k)
2
2

for every k Z, so that S(cf ) = cS(f ). Q



Q

(c) If f L2C has Fourier coefficients ck , then S(f ) = h2ck ikZ , so by 282J(i)

X Z
kS(f
)k22 = 2 2
|ck | = |f (x)|2 dx = kf k22 .
k=

Thus Su `2C (Z) and kSuk2 = kuk2 for every u L2C . Because S is linear and norm-preserving, it is surely
injective.
(d) It now follows that (Sv|Su) = (v|u) for every u, v L2C . P
P (This is of course a standard fact about
Hilbert spaces.) We know that for any t R

kuk22 + 2 Re(eit (v|u)) + kvk22 = (u|u) + eit (v|u) + eit (u|v) + (v|v)
= (u + eit v|u + eit v)
= ku + eit vk22 = kS(u + eit v)k22
= kSuk22 + 2 Re(eit (Sv|Su)) + kSvk22
= kuk22 + 2 Re(eit (Sv|Su)) + kvk22 ,
so that Re(eit (Sv|Su)) = Re(eit (v|u)). As t is arbitrary, (Sv|Su) = (v|u). Q
Q
(n)
(e) Finally, S is surjective. PP Let c = hck ikZ be any member of `2C (Z). Set ck = ck if |k| n, 0
(n)
otherwise, and c (n) = hck ikN . Consider
n
X
sn = ck ek , un = sn
k=n

where I write ek (x) = 1 eikx for x ], ]. Then Sun = c (n) , by the same calculations as in part (a) of
2
the proof of 282J. Now
424 Fourier analysis 282K
qP
kcc(n) c k2 = |k|>n |ck |2 0
as n , so
kum un k2 = kcc(m) c (n) k2 0
as m, n , and hun inN is a Cauchy sequence in L2C . Because L2C is complete (244G), it has a limit
u L2C , and now
Su = limn Sun = limn c(n) = c. Q
Q
Thus S : L2C `2C (Z) is an inner-product-space isomorphism.
Remark In the language of Hilbert spaces, all that is happening here is that hek ikZ is a Hilbert space
basis or complete orthonormal sequence in L2C , which is matched by S with the standard basis of `2C (Z).
The only step which calls on non-trivial real analysis, as opposed to the general theory of Hilbert spaces, is
the check that the linear subspace generated by {ek : K Z} is dense; this is part (b) of the proof of 282J.
Observe that while S : L2 `2 is readily described, its inverse is moreP of a problem. If c `2 , we should
1 1 ikx
like to say that S c is the equivalence class of f , where f (x) = 2 k= ck e for every x. This works
very well if {k : ck 6= 0} is finite, but for the general case it is less clear how to interpret the sum. It is in
fact the case that if c `2 then
1 Pn
g(x) = limn k=n ck eikx
2

is defined for almost every x ], ], and that S 1c = g in L2 ; this is, in effect, Carlesons theorem
(286V). A proof of Carlesons theorem is out of our reach for the moment. What is covered by the results
of this section is that
1 1 Pm Pn ikx
h(x) = limm n=0 k=n ck e
2 m+1

is defined for almost every x ], ], and that h = S 1c . (The point is that we know from the result just

proved that there is some square-integrable f such that c is the sequence of Fourier coefficients of f ; now
282Ia declares that the Fejer sums of f converge to f almost everywhere, that is, that h =a.e. 12 f .)

282L The next result is the easiest, and one of the most useful, theorems concerning pointwise conver-
gence of Fourier sums.
Theorem Let f be a complex-valued function which is integrable over ], ] and ck its Fourier coefficients,
sn its Fourier sums.
(i) If f is differentiable at x ], [, then f (x) = limn sn (x).
(ii) If the periodic extension f of f is differentiable at , then f () = limn sn ().
proof (a) Take x ], ] such that f is differentiable at x; of course this covers both parts. We have
Z
1 f(x+t) 1
sn (x) = 1 sin(n + )t dt
2
sin 2 t 2

for each n, by 282Da.


(b) Next,
Z
f (x+t)f(x)
dt

t

exists in C, because there is surely some ]0, ] such that (f(x+t) f(x))/t is bounded on {t : 0 < |t| },
while
Z Z
f (x+t)f(x) f (x+t)f(x)
dt, dt

t
t

exist because 1/t is bounded on those intervals. It follows that


282N Fourier series 425

Z
f (x+t)f(x)
dt

sin 12 t

exists, because |t| | sin 12 t| if |t| . So by the Riemann-Lebesgue lemma (282Fb),


Z
f(x+t)f(x) 1
lim 1 sin(n + )t dt = 0.
n
sin 2 t 2

(c) Because
Z 1
1 sin(n+ 2 )t
f(x) 1 dt = f(x)
2
sin 2 t

for every n (282Dc),


Z
1 f (x+t)f(x) 1
sn (x) = f(x) + sin(n + )t dt f(x)
2
sin 12 t 2

as n , as required.

282M Lemma Suppose that f is a complex-valued function, defined almost everywhere and of bounded
variation on ], ]. Then supkZ |kck | < , where ck is the kth Fourier coefficient of f , as in 282A.
proof Set
M = limxdom f,x |f (x)| + Var],[ (f ).
By 224J,
Z Z c
1 1
|kck | = kf (t)eikt dt M sup keikt dt
2 2 c[,]

M M
= sup |eikc eik |
2 c[,]

for every k.

282N I give another lemma, extracting the technical part of the proof of the next theorem. (Its most
natural application is in 282Xn.)
Pn 1
Pm
Lemma Let hdk ikN be a complex sequence, and set tn = k=0 dk , m = m+1 n=0 tn for n, m N.
Suppose that supkN |kdk | = M < . Then for any j 1 and any c C,
M
|tn c| + (2j + 3) supmnn/j |m c|
j

for every n j 2 .
proof (a) The first point to note is that for any n, n0 N,
M |nn0 |
|tn tn0 | .
1+min(n,n0 )

P If n = n0 this is trivial. Suppose that n0 < n. Then


P
n
X n
X M M (nn0 ) M |nn0 |
|tn tn0 | = | dk | = .
k n0 +1 1+min(n0 ,n)
k=n0 +1 k=n0 +1

Of course the case n < n0 is identical. Q


Q
(b) Now take any n j 2 . Set = supmnn/j |m c|. Let m j be such that jm n < j(m + 1);
then n < jm + m; also
426 Fourier analysis 282N

n(1 1j ) m(j + 1)(1 1j ) mj.


Set
jm+m
X
1 jm+m+1 jm+1
= tn = jm+m jm .
m m m
n=jm+1

Then

jm+m+1 jm+1
| c| = | jm+m jm c|
m m
jm+m+1 jm+1
=| (jm+m c) (jm c)|
m m
jm+m+1 jm+1
+ (2j + 3).
m m

On the other hand,


jm+m
X jm+m
X
1 1 M |nn0 |
| tn | = (tn0 tn )
m m 1+min(n,n0 )
n0 =jm+1 n0 =jm+1
jm+m
X
1 Mm Mm M
= .
m 1+jm 1+jm j
n0 =jm+1

Putting these together, we have


M M
|tn c| |tn | + | c| + (2j + 3) = + (2j + 3) supmnn/j |m c|,
j j
as required.

282O Theorem Let f be a complex-valued function of bounded variation, defined almost everywhere
on ], ], and let hsn inN be its sequence of Fourier sums (282Ab).
(i) If x ], [, then
1
limn sn (x) = (limtdom f,tx f (t) + limtdom f,tx f (t)).
2

1
(ii) limn sn () = (limtdom f,t f (t) + limtdom f,t f (t)).
2
(iii) If f is defined throughout ], ], is continuous, and limt f (t) = f (), then sn (x) f (x)
uniformly on ], ].
proof (a) Note first that 224F shows that the limits limtdom f,tx f (t), limtdom f,tx f (t) required in the
formulae above always exist. We know also from 282M that M = supkZ |kck | < , where ck is the kth
Fourier coefficient of f .
Take any x ], ], and set
c = 12 (limtdom f,tx f(t) + limtdom f,tx f(t)),
writing f for the periodic extension of f , as usual. We know from 282Id-282Ie that c = limm m (x),
writing m for the Fejer sums of f . Take any j max(2, 2M/). Take m0 1 such that |m (x) c|
/(2j + 3) for every m m0 .
Now if n max(j 2 , 2m0 ), apply Lemma 282N with
d0 = c0 , dk = ck eikx + ck eikx for k 1,
so that tn = sn (x), m = m (x) and |kdk | 2M for every k, n, m N. We have n n/j 12 n m0 , so

= supmnn/j |m c| supmm0 |m c| .
2j+3
282Q Fourier series 427

So 282N tells us that


2M
|sn (x) c| = |tn c| + (2j + 3) supmnn/j |m c| + (2j + 3) 2.
j
As is arbitrary, limn sn (x) = c, as required.
(b) This proves (i) and (ii) of this theorem. Finally, for (iii), observe that under these conditions
m (x) f (x) uniformly as m , by 282G. So given > 0 we choose j max(2, 2M/) and m0 N
such that |m (x) f (x)| /(2j + 3) for every m m0 , x ], ]. By the same calculation as before,
|sn (x) f (x)| 2
2
for every n max(j , 2m0 ) and every x ], ]. As is arbitrary, limn sn (x) = f (x) uniformly for
x ], ].

282P Corollary Let f be a complex-valued function which is integrable over ], ], and hsn inN its
sequence of Fourier sums.
(i) Suppose that x ], [ is such that f is of bounded variation on some neighbourhood of x. Then
1
limn sn (x) = (limtdom f,tx f (t) + limtdom f,tx f (t)).
2

(ii) If there is a > 0 such that f is of bounded variation on both ], + ] and [ , ], then
1
limn sn () = (limtdom f,t f (t) + limtdom f,t f (t)).
2

proof In case (i), take > 0 such that f is of bounded variation on [x , x + ] and set f1 (t) = f (t) if
x dom f [x , x + ], 0 for other t ], ]; in case (ii), set f1 (t) = f (t) if t dom f and |t| , 0
for other t ], ], and say that x = . In either case, f1 is of bounded variation, so by 282O the Fourier
sums hs0n inN of f1 converge at x to the value given by the formulae above. But now observe that, writing
f and f1 for the periodic extensions of f and f1 , f f1 = 0 on a neighbourhood of x, so
Z
f(x+t)f1 (x+t)
1 dt

sin 2 t

exists in C, and by 282Fb


Z
f(x + t) f1 (x + t) 1
lim sin(n + )t dt = 0,
n sin 12 t 2

that is, limn sn (x) s0n (x) = 0. So hsn inN also converges to the right limit.

282Q I cannot leave this section without mentioning one of the most important facts about Fourier
series, even though I have no space here to discuss its consequences.
Theorem Let f and g be complex-valued functions which are integrable over ], ], and hck ikN , hdk ikN
their Fourier coefficients. Let f g be their convolution, defined by the formula
Z Z
(f g)(x) = f (x 2 t)g(t)dt = f(x t)g(t)dt,

as in 255O, writing f for the periodic extension of f . Then the Fourier coefficients of f g are h2ck dk ikZ .
proof By 255O(d-i),
Z Z Z
1 1
(f g)(x)eikx dx = eik(t+u) f (t)g(u)dtdu
2
2
Z Z
1
= eikt f (t)dt eiku g(u)du = 2ck dk .
2
428 Fourier analysis *282R

*282R In my hurry to get to the theorems on convergence of Fejer and Fourier sums, I have rather
neglected the elementary manipulations which are essential when applying the theory. One basic result is
the following.
Proposition (a) Let f : [, ] C be an absolutely continuous function such that f () = f (), and
hck ikZ its sequence of Fourier coefficients. Then the Fourier coefficients of f 0 are hikck ikZ .
(b) Let f : R C be a differentiable function such that f 0 is absolutely P continuous on [, ] and
f () = f 0 (). If hck ikZ are the Fourier coefficients of f ], ], then k= |ck | is finite.
0

proof (a) By 225Cb, f 0 is integrable over [, ]; by 225E, f is an indefinite integral of f 0 . So 225F tells
us that
R 0 R

f (x)eikx dx = f ()eik f ()eik + ik f (x)eikx dx = ikck
for every k Z.
(b) By (a), applied twice, the Fourier coefficients of f 00 are hk 2 ck ikZ , so supkZ k 2 |ck | is finite; because
P 1 P
k=1 2 < ,
k k= |ck | < .

282X Basic exercises > (a) SupposePthat hck ikN is an absolutely summable double-ended sequence

of complex numbers. Show that f (x) = k= ck eikx exists for every x R, that f is continuous and
periodic, and that its Fourier coefficients are the ck .

(c) Set n (t) = 2t sin(n + 12 t) for t 6= 0. (This is sometimes called the modified Dirichlet kernel.) Show
that for any integrable function f on ], ], with Fourier sums hsn inN and periodic extension f,
R
limn |sn (x) 2 1
(t)f(x + t)dt| = 0
n
2 1
for every x ], ]. (Hint: show that t sin 21 t
is bounded, and use 282E.)

(d) Give a proof of 282Ib from 242O, 255Oe and 282G.

(e) Give another proof of 282Ic, based on 242O and 281J instead of on 282H.

(f ) Use the idea of 255Yi to shorten one of the steps in the proof of 282H, taking
gm (t) = min( m+1
2 , 4(m+1)t2 )

for |t| , so that gm m on [, ].

> (g)(i) Let f be a real square-integrable


P function on R], ], and hak ikN , hbk ik1 its real p
Fourier coef-

ficients (282Ba). Show that 12 a20 + k=1 (a2k + b2k ) = 1 |f (x)|2 dx. (ii) Show that f 7 ( 2 a0 , a1 ,

b1 , . . . ) defines an inner-product-space isomorphism between the real Hilbert space L2R of equivalence
classes of real square-integrable functions on ], ] and the real Hilbert space `2R of square-summable
sequences.

(h) Show that 4 = 1 13 + 15 17 + . . . . (Hint: find the Fourier series of f where f (x) = x/|x|, and
compute the sum of the series at 2 . Of course there are other methods, e.g., examining the Taylor series
for arctan 4 .)

(i) Let f be an integrable complex-valued function on ], ], and hsn inN its sequence of Fourier sums.
R f (t)a
Suppose that x ], [, a C are such that dt exists and is finite. Show that limn sn (x) = a.
tx
Explain how this generalizes 282L. What modification is appropriate to obtain a limit limn sn ()?

(j) Suppose that > 0, K 0 and f : ], [ C are such that |f (x) f (y)| K|x y| for all
x, y ], [. Show that the Fourier sums of f converge to f everywhere on ], [. (Hint: use 282Xi.)
(Compare 282Yb.)
282Yb Fourier series 429

(k) In 282L, show that it is enough if f is differentiable with respect to its domain at x or (see 262Fb),
rather than differentiable in the strict sense.
Ra Rb
(l) Show that lima 0 sint t dt exists and is finite. (Hint: use 224J to estimate a sint t dt for 0 < a b.)
R Ra
(m) Show that 0 | sint t| dt = . (Hint: show that supa0 | 1
cos 2t
t dt| < , and therefore that
Ra 2
supa0 1 sint t dt = .)

> (n) Let hdk ikN be a sequence in C such that supkN |kdk | < and
1 Pm Pn
limm n=0 k=0 dk = c C.
m+1
P
Show that c = k=0 dk . (Hint: 282N.)
P 1 2
> (o) Show that n=1 n2 = 6 . (Hint: find the Fourier series of f where f (x) = |x|, and compute the
sum of the series at 0.)

(p) Let f be an integrable complex-valued function on ], ], and hsn inN its sequence of Fourier sums.
Suppose that x ], [ is such that
(i) there is an a C such that
R x af (t)
either dt exists in C
xt
or there is some > 0 such that f is of bounded variation on [x, x], and a = limtdom f,tx f (t)
(ii) there is a b C such that
R f (t)b
either x dt exists in C
tx
or there is some > 0 such that f is of bounded variation on [x, x+], and b = limtdom f,tx f (t).
Show that limn sn (x) = 21 (a + b). What modification is appropriate to obtain a limit limn sn ()?

> (q) Let f , g be integrable complex-valued functionsP on ], ], and P c = hck ikZ , d = hdk ikZ their

sequences of Fourier coefficients. Suppose that either k= |ck | < or k= |ck |2 + |dk |2 < . Show
that the sequence of Fourier coefficients of f g is just the convolution c d of c and d (255Xe).

(r) In 282Ra, what happens if f () 6= f ()?


P
(s) Suppose that hck ikN is a double-ended sequence of complex numbers such that k= |kck | < .
P
Show that f (x) = k= ck eikx exists for every x R and that f is differentiable everywhere.

(t) Let hck ikZ be a double-ended sequence of complex numbers such that supkZ |kck | < . Show that
there is a square-integrable function f on ], ] such that the ck are the Fourier coefficients of f , that f
is the limit almost everywhere of its Fourier sums, and that f f f is differentiable. (Hint: use 282K to
show that there is an f , and 282Xn to show that its Fourier sums converge wherever its Fejer sums do; use
282Q and 282Xs to show that f f f is differentiable.)

282Y Further exercises (a) Let f be a non-negative integrable function on ], ], with Fourier
coefficients hck ikZ . Show that
Pn Pn
j=0 k=0 aj ak cjk 0

for all complex numbers a0 , . . . , an . (See also 285Xr below.)

(b) Let f : ], ] C, K 0, > 0 be such that |f (x) f (y)| K|x y| for all x, y ], ].
Let ck , sn be the Fourier coefficients and sums of f . (i) Show that supkZ |k| |ck | < . (Hint: show that
R
ck = 4 1

(f (x) f(x + k ))eikx dx.) (ii) Show that if f () = limx f (x) then sn f uniformly.
(Compare 282Xj.)
430 Fourier analysis 282Yc

R (c) Let pf be a measurable complex-valued function on ], ], and suppose that p [1, [ is


R such that

|f (x)| dx < . Let hm imN be the sequence of Fejer sums of f . Show that lim m
|f (x)
m (x)|p dx = 0. (Hint: use 245Xk, 255Yl and the ideas in 282Ib.)

(d) Construct a continuous function h : [, ] R such that h() = h() but the Fourier sums of h
R sin(m+ 12 )t sin(n+ 21 )t
are unbounded at 0, as follows. Set (m, n) = 0 1 dt. Show that limn (m, n) = 0 for
P sin 2 t
every m, but limn (n, n) = . Set h0 (x) = k=0 k sin(mk + 12 )x for 0 x , 0 for x 0,
where k > 0, mk N are such that () k 2k , k |(mk , mn )| 2k for every n < k (choosing k ) ()
k (mk , mk ) k, n |(mk , mn )| 2n for every n < k (choosing mk ). Now modify h0 on [, 0[ by adding
a function of bounded variation.
R sin(n+ 1 )t
(e) (i) Show that limn | sin 1 t2 |dt = . (Hint: 282Xm.) (ii) Show that for any > 0 there are
R R 2
n N, f 0 such that f , |sn | 1, where sn is the nth Fourier sum of f . (Hint: take n such
1 R sin(n+ 12 )t 1
that
| sin 1 t |dt > and set f (x) = for 0 x , 0 otherwise, for small .) (iii) Show that there
2 2
is an integrable function f : ], ] R such that supnN ksn k1 is infinite, where hsn inN is the sequence
of Fourier sums of f . (Hint: it helps to know the Uniform Boundedness Theorem of functional analysis,
but f can also be constructed bare-handed by the method of 282Yd.)

282 Notes and comments This has been a long section with a potentially confusing collection of results, so
perhaps I should recapitulate. Associated with any P integrable function on ], ] we haveP the corresponding
n ikx ikx
Fourier sums, being the symmetric partial sums k=n ck e of the complex series k= ck e , or,
1
Pn 1
P
equally, the partial sums 2 a0 + k=1 ak cos kx + bk sin kx of the real series 2 a0 + k=1 ak cos kx + bk sin kx.
The Fourier coefficients ck , ak , bk are the only natural ones, because if the series is to converge with any
regularity at all then
1 R P
ikx ilx
2 k= ck e e dx

ought to be simultaneously
P 1 R ikx ilx
k= 2 c e
k
e dx = cl

and
1 R
f (x)eilx dx.
2

(Compare the calculations in 282J.) The effect of taking Fejer sums m (x) rather than the Fourier sums
sn (x) is to smooth the sequence out; recall that if limn sn (x) = c then limm m (x) = c, by 273Ca in
the last chapter.
Most of the work above is concerned with the question of when Fourier or Fejer sums converge, in some
sense, to the original function f . As has happened before, in 245 and elsewhere, we have more than one kind
of convergence to consider. Norm convergence, for k k1 or k k2 or k k , is the simplest; the three theorems
282G, 282Ib and 282J at least are relatively straightforward. (I have given 282Ib as a corollary of 282Ia;
but there is an easier proof from 282G. See 282Xd.) Respectively, we have
if f is continuous (and matches at , that is, f () = limt f (t)) then m f uniformly,
that is, for k k (282G);
if f is any integrable function, then m f for k k1 (282Ib);
if f is a square-integrable function, then sn f for k k2 (282J);
if f is continuous and of bounded variation (and matches at ), then sn f uniformly
(282O).
There are some similar results for other k kp (282Yc); but note that the Fourier sums need not converge for
k k1 (282Ye).
Pointwise convergence is harder. The results I give are
if f is any integrable function, then m f almost everywhere (282Ia);
282 Notes Fourier series 431

this relies on some careful calculations in 282H, and also on the deep result 223D. Next we have the results
which look at the average of the limits of f from the two sides. Suppose I write
1
f (x) = (limtx f (t) + limtx f (t))
2

whenever this is defined, taking f () = 12 (limt f (t) + limt f (t)). Then we have
if f is any integrable function, m f wherever f is defined (282Id);
if f is of bounded variation, sn f everywhere (282O).
Of course these apply at any point at which f is continuous, in which case f (x) = f (x). Yet another result
of this type is
if f is any integrable function, sn f at any point at which f is differentiable (282L);
in fact, this can be usefully extended for very little extra labour (282Xi, 282Xp).
I cannot leave this list without mentioning the theorem I have not given. This is Carlesons theorem:
if f is square-integrable, sn f almost everywhere
(Carleson 66). I will come to this in 286. There is an elementary special case in 282Xt. The result is in
fact valid for many other f (see the notes to 286).
The next glaring lacuna in the exposition here is the absence of any examples to show how far these
results are best possible. There is no suggestion, indeed, that there are any natural necessary and sufficient
conditions for
sn f at every point.
Nevertheless, we have to make an effort to find a continuous function for which this is not so, and the
construction of an example by du Bois-Reymond (Bois-Reymond 1876) was an important moment in the
history of analysis, not least because it forced mathematicians to realise that some comfortable assumptions
about the classification of functions essentially, that functions are either good or so bad that one neednt
trouble with them were false. The example is instructive but I have had to omit it for lack of space;
I give an outline of a possible method in 282Yd. (You can find a detailed construction in Korner 88,
chapter 18, and a proof that such a function exists in Dudley 89, 7.4.3.) If you allow general integrable
functions, then you can do much better, or perhaps I should say much worse; there is an integrable f such
that supnN |sn (x)| = for every x ], ] (Kolmogorov 26; see Zygmund 59, VIII.3-4).
In 282C I mentioned two types of problem. The first when is a Fourier series summable? has at least
been treated at length, even though I cannot pretend to have given more than a sample of what is known.
The second how do properties of the ck reflect properties of f ? I have hardly touched on. I do give what
seem to me to be the three most important results in this area. The first is
if f and g have the same Fourier coefficients, they are equal almost everywhere (282Ic, 282Xe).
This at least tells us that we ought in principle to be able to learn almost anything about f by looking at its
Fourier series. (For instance, 282Ya describes a necessary and sufficient condition for f to be non-negative
almost everywhere.) The second is
P
f is square-integrable iff k= |ck |2 < ;
in fact,
P 2 1 R
k= |ck | =
|f (x)|2 dx (282J).
2
Of course this is fundamental, since it shows that Fourier coefficients provide a natural Hilbert space iso-
morphism between L2 and `2 (282K). I should perhaps remark that while the real Hilbert spaces L2R , `2R
are isomorphic as inner product spaces (282Xg), they are certianly not isomorphic as Banach lattices; for
instance, `2R has atomic elements c such that if 0 d c then d is a multiple of c , while L2R does not.
Perhaps even more important is
the Fourier coefficients of a convolution f g are just a scalar multiple of the products of the
Fourier coefficients of f and g (282Q);
but to use this effectively we need to study the Banach algebra structure of L1 , and I have no choice but to
abandon this path immediately. (It will form a conspicuous part of Chapter 44 in Volume 4.) 282Xt gives an
elementary consequence, and 282Xq a very partial description of the relationship between a product f g
of two functions and the convolution product of their sequences of Fourier coefficients.
432 Fourier analysis 282 Notes

I end these notes with a remark on the number 2. This enters nearly every formula involving Fourier
series, but could I think be removed totally from the present section, at least, by re-normalizing the measure
1
of ], ]. If instead of Lebesgue measure we took the measure = 2 throughout, then every 2 would
disappear. (Compare the remark in 282Bb concerning the possibility of doing integrals over S 1 .) But I think
most of us would prefer to remember the location of a 2 in every formula than to deal with an unfamiliar
measure.

283 Fourier transforms I


I turn now to the theory of Fourier transforms on R. In the first of two sections on the subject, I present
those parts of the elementary theory which can be dealt with using the methods of the previous section
on Fourier series. I find no way of making sense of the theory, however, without introducing a fragment of
L.Schwartz theory of distributions, which I present in 284. As in 282, of course, this treatment also is
nothing but a start in the topic.
The whole theory can also be done in R r . I leave this extension to the exercises, however, since there are
few new ideas, the formulae are significantly more complicated, and I shall not, in this volume at least, have
any use for the multidimensional versions of these particular theorems, though some of the same ideas will
appear, in multidimensional form, in 285.

283A Definitions Let f be a complex-valued function which is integrable over R.



(a) The Fourier transform of f is the function f : R C defined by setting
1 R iyx
f (y) =
e f (x)dx
2

for every y R. (Of course the integral is always defined because x 7 eiyx is bounded and continuous,
therefore measurable.)

(b) The inverse Fourier transform of f is the function f : R C defined by setting
1 R iyx
f (y) =
e f (x)dx
2
for every y R.

283B Remarks (a) It is a mildly vexing feature of the theory of Fourier transforms vexing, that is,
for outsiders like myself that there is in fact no standard definition of Fourier transform. The commonest
definitions are, I think,
1 R iyx
f (y) =
e f (x)dx,
2

R
f (y) =
eiyx f (x)dx,
R
f (y) =
e2iyx f (x)dx,
corresponding to inverse transforms
1 R iyx
f (y) = e f (x)dx,
2

1 R iyx
f (y) = e f (x)dx,
2

R
f (y) =
e2iyx f (x)dx.
I leave it to you to check that the whole theory can be carried through with any of these six pairs, and to
investigate other possibilities (see 283Xa-283Xb below).
283C Fourier transforms I 433


(b) The phrases Fourier transform, inverse Fourier transform make it plain that (f ) is supposed to be
f , at least some of the time. This is indeed the case, but the class of f for which this is true in the literal
sense is somewhat constrained, and we shall have to wait a little while before investigating it.

(c) No amount of juggling with constants, in the manner of (a) above, can make f and f quite the same.

However, on the definitions I have chosen, we do have f (y) = f (y) for every y, so that f and f will share
essentially all the properties of interest to us here; in particular, everything in the next proposition will be
valid with in place of , if you change signs at the right points in parts (c), (h) and (i).

283C Proposition Let f and g be complex-valued functions which are integrable over R.


(a) (f + g) = f + g.

(b) (cf ) = cf for every c C.

(c) If c R and h(x) = f (x + c) whenever this is defined, then h(y) = eicy f (y) for every y R.

(d) If c R and h(x) = eicx f (x) for every x dom f , then h(y) = f (y c) for every y R.
1
(e) If c > 0 and h(x) = f (cx) whenever this is defined, then h(y) = f (cy) for every y R.
c

(f) f : R C is continuous.

(g) limy f (y) = limy f (y) = 0.
R
(h) If |xf (x)|dx < , then f is differentiable, and its derivative is
i R iyx
f 0 (y) = e xf (x)dx
2
for every y R.

(i) If f is absolutely continuous on every bounded interval and f 0 is integrable, then (f 0 ) (y) = iy f (y)
for every y R.
proof (a) and (b) are trivial, and (c), (d) and (e) are elementary substitutions.
(f ) If hyn inN is any convergent sequence in R with limit y, then
Z
1
f (y) = lim eiyn x f (x)dx
2 n
Z
1
= lim eiyn x f (x)dx = lim f (yn )
n 2 n

by Lebesgues Dominated Convergence Theorem, because |eiyn x f (x)| |f (x)| for every n N, x dom f .

As hyn inN is arbitrary, f is continuous.
(g) This is just the Riemann-Lebesgue lemma (282E).
iyx
(h) The point is that | y e f (x)| = |xf (x)| for every x dom f , y R. So by 123D
Z Z
1 d 1 d
f 0 (y) = eiyx f (x)dx = eiyx f (x)dx
2 dy
2 dy dom f
Z Z
1 iyx 1
= e f (x)dx = ixeiyx f (x)dx
2 dom f
y 2
Z
i
= xeiyx f (x)dx.
2

(i) Because f is absolutely continuous on every bounded interval,


Rx R0
f (x) = f (0) + 0 f 0 for x 0, f (x) = f (0) x f 0 for x 0.
Because f 0 is integrable,
434 Fourier analysis 283C

R R0
limx f (x) = f (0) + 0
f 0 , limx f (x) = f (0)
f0
both exist. Because f also is integrable, both limits must be zero. Now we can integrate by parts (225F) to
see that
Z Z a
1 1
(f 0 ) (y) = eiyx f 0 (x)dx = lim eiyx f 0 (x)dx
2 2 a a
Z
1 iy
= lim eiya f (a) lim eiya f (a) + eiyx f (x)dx
2 a a 2

= iy f (y).

Ra
sin x
R a sin x
283D Lemma (a) lima 0 x dx = 2 , lima a x dx = .
Rb
(b) There is a K < such that | a sinxcx dx| K whenever a b and c R.
proof (a)(i) Set
Ra sin x R0 sin x
F (a) = 0
dx if a 0, F (a) = a
dx if a 0,
x x
Rb sin x
so that F (a) = F (a) and a x dx = F (b) F (a) for all a b.
If 0 < a b, then by 224J
Rb sin x 1 1 1 Rc 1 2
| a
dx| ( + ) supc[a,b] | a
sin x dx| supc[a,b] | cos c cos a| .
x b a b a a
2
In particular, |F (n) F (m)| m if m n in N, and hF (n)inN is a Cauchy sequence with limit say; now
2
| F (a)| = limn |F (n) F (a)|
a
for every a > 0, so lima F (a) = . Of course we also have
Ra sin x
lima a
dx = lima (F (a) F (a)) = lima 2F (a) = 2.
x

(ii) So now I have to calculate . For this, observe first that


R a sin x R sin at
2 = lima a
dx = lima
dt
x t

(substituting x = t/a). Next,


1 1 sin uu
limt0 = limu0 = 0,
t 2 sin 12 t 2u sin u
so
R 1
1
1
dt < ,
t 2 sin 2 t

and by the Riemann-Lebesgue lemma (282Fb)


R 1 1
lima
1 sin at dt = 0.
t 2 sin 2 t

But we know that


R sin(n+ 12 )t
dt =
2 sin 12 t

for every n (using 282Dc), so we must have


Z a Z Z
sin t sin at sin at
lim dt = lim dt = lim 1 dt
a a t a t a 2 sin 2 t
Z
sin(n+ 12 )t
= lim 1 dt = ,
n
2 sin 2 t
283E Fourier transforms I 435

and = /2, as claimed.


(b) Because F is continuous and

lima F (a) = = , lima F (a) = = ,
2 2
F is bounded; say |F (a)| K1 for all a R. Try K = 2K1 . Now suppose that a < b and c R. If c > 0,
then
Rb sin cx R bc sin t
| a
dx| =| ac
dt| = |F (bc) F (ac)| 2K1 = K,
x t
substituting x = t/c. If c < 0, then
Rbsin cx Rb sin(c)x
| a
dx| =| a
dx| K;
x x
while if c = 0 then
Rb sin cx
| a
dx| = 0 K.
x

283E The hardest work of this section will lie in the pointwise inversion theorems 283I and 283K
below. I begin however with a relatively easy, and at least equally important, result, showing (among other
things) that an integrable function f can (essentially) be recovered from its Fourier transform.
Lemma Whenever c < d in R,
Z a
eidy eicy
lim eiyx dy = 2i if c < x < d,
a a
y

= i if x = c or x = d,
= 0 if x < c or x > d.

proof We know that for any b > 0


Ra sin bx R ab sin t
lima a
dx = lima ab
dt =
x t
(subsituting x = t/b), and therefore that for any b < 0
Ra sin bx Ra sin(b)x
lima a
dx = lima a
dx = .
x x
Now consider, for x R,
Ra eidy eicy
lima a
eiyx dy.
y
Ra
First note that all the integrals a
exist, because
eidy eicy
limy0 = i(d c)
y
is finite, and the integrand is certainly continuous except at 0. Now we have

Z a
eidy eicy
eiyx dy
a
y
Z a
ei(dx)y ei(cx)y
= dy
a
y
Z a Z a
cos(dx)ycos(cx)y sin(dx)ysin(cx)y
= dy +i dy
a
y a
y
Z a
sin(dx)ysin(cx)y
=i dy
a
y
436 Fourier analysis 283E

because cos is an even function, so


Ra cos(dx)ycos(cx)y
a
dy =0
y

for every a 0. (Once again, this integral exists because


cos(dx)ycos(cx)y
limy0 = 0.)
y

Accordingly
Z a Z a Z a
eidy eicy sin(dx)y sin(cx)y
lim eiyx dy = i lim dy i lim dy
a a
y a a y a a
y

= i i = 0 if x < c,
= i 0 = i if x = c,
= i + i = 2i if c < x < d,
= 0 + i = i if x = d,
= i + i = 0 if x > d.


283F Theorem Let f be a complex-valued function which is integrable over R, and f its Fourier
transform. Then whenever c d in R,

Z d Z a
i eicy eidy
f (x)dx = lim f (y)dy.
c
2 a a y

proof If c = d this is trivial; let us suppose that c < d.

(a) Writing
Z a
eidy eicy
a (x) = eiyx dy
y
a

for x R, a 0, 283E tells us that


lima a (x) = 2i(x)
1 1
where = 2 ([c, d]
+ ]c, d[) takes the value 1 inside the interval [c, d], 0 outside and the value 2 at the
endpoints. At the same time,
Z a
sin(dx)ysin(cx)y
|a (x)| = | dy|
a
y
Z a Z a
sin(dx)y sin(cx)y
| dy| + | dy| 2K
a
y a
y

for all a 0, x R, where K is the constant of 283Db. Consequently |f a | 2K|f | everywhere on dom f ,
for every a 0, and (applying Lebesgues Dominated Convergence Theorem to sequences hf an inN , where
an )
R R Rd
lima f a = 2i f = 2i c f .

(b) Now consider the limit in the statement of the theorem. We have
283H Fourier transforms I 437

Z a Z a Z icy
eicy eidy 1 e eidy iyx
f (y)dy = e f (x)dxdy
a y 2 a y
Z Z a icy
1 e eidy iyx
= e f (x)dydx
2 a y
Z
1
= f (x)a (x)dx,
2

by Fubinis and Tonellis theorems (252H), using the fact that (eicy eidy )/y is bounded to see that
R R a eicy eidy iyx
y e f (x)dydx
a

is finite. Accordingly
Z a Z
i eicy eidy i
lim f (y)dy = lim f (x)a (x)dx
2 a a y 2 a
Z d Z d
i
= 2i f (x)dx = f (x)dx,
2 c c

as required.



283G Corollary If f and g are complex-valued functions which are integrable over R, then f = g iff
f =a.e. g.
proof If f =a.e. g then of course
1 R 1 R
f (y) =
eiyx f (x)dx = e iyx
g(x)dx = g(y)
2 2


for every y R. Conversely, if f = g, then by the last theorem
Rd Rd
c
f= c g
for all c d, so f = g almost everywhere, by 222D.


283H Lemma Let f be a complex-valued function which is integrable over R, and f its Fourier
transform. Then for any a > 0, x R,
1 Ra
1 R sin a(xt) 1 R sin at

a
eixy f (y)dy =
f (t)dt =
f (x t)dt.
2 xt t

proof We have
Ra R R
a
|eixy eiyt f (t)|dtdy 2a
|f (t)|dt < ,
so (because the function (t, y) 7 eixy eiyt f (t) is surely jointly measurable) we may reverse the order of
integration, and get
Z a Z aZ
1 1
eixy f (y)dy = eixy eiyt f (t)dt dy
2 a 2 a
Z Z a
1
= f (t) ei(xt)y dy dt
2 a
Z Z
1 2 sin(xt)a 1 sin au
= f (t)dt = f (x u)du,
2
xt
u

substituting t = x u.
438 Fourier analysis 283I

283I Theorem Let f be a complex-valued function which is integrable over R, and suppose that f is
differentiable at x R. Then
1 Ra 1 Ra
f (x) = lima a eixy f (y)dy = lima a eixy f (y)dy.
2 2

proof Set g(u) = f (x) if |u| 1, 0 otherwise, and observe that limu0 u1 (f (x u) g(u)) = f 0 (x) is
finite, so that there is a ]0, 1] such that
f (xu)g(u)
K = sup0<|u| < .
u
Consequently
Z Z Z 1
f (xu)g(u)
du 1 |f (x u)|du +
1
|g(u)|du

u
1
Z Z
1
+ K du + |f (x u)|du


Z
1 2
|f | + |f (x)| + 2K < .

By the Riemann-Lebesgue lemma (282Fb),


R sin au
lima
(f (x u) g(u))du = 0.
u
R sin au
If we now examine u g(u)du, we get
Z Z 1 Z a
sin au sin au sin v
g(u)du = f (x)du = f (x) dv,

u 1
u a
v

substituting u = v/a. So we get


Z Z
sin au sin au
lim f (x u)du = lim g(u)du
a
u a
u
Z a
sin v
= lim f (x) dv = f (x),
a a
v

by 283D. Accordingly
Z a Z
1 1 sin au
lim eixy f (y)dy = lim f (x u)du = f (x),
2 a a a u

using 283H. As for the second equality,

Z a Z a
1 1
lim eixy f (y)dy = lim eixy f (y)dy
2 a a 2 a a
Z a
1
= lim eixu f (u)du = f (x)
2 a a

(substituting y = u).
Remark Compare 282L.


283J Corollary Let f : R C be an integrable function such that f is differentiable and f is integrable.

Then f = (f ) = (f ) .

proof Because f is integrable,
283L Fourier transforms I 439

1 R a ixy
f (x) = lima e f (y)dy = f (x)
2 a
for every x R. Similarly,
1 R a ixy
f (x) = lima e f (y)dy = f (x).
2 a

283K The next proposition gives a class of functions to which the last corollary can be applied.
Proposition Suppose that f is a twice-differentiable function from R to C such that f , f 0 and f 00 are all

integrable. Then f is integrable.
proof Because f 0 and f 00 are integrable, f and f 0 are absolutely continuous on any bounded interval (225L).
So by 283Ci we have

(f 00 ) (y) = iy(f 0 ) (y) = y 2 f (y)

for every y R. At the same time, by 283Cf-283Cg, (f 00 ) and f must be bounded; say |f (y)| + |(f 00 ) (y)|
K for every y R. Now
K
|f (y)|
1+y 2
for every y, so that
R R 11 R 1

|f | K y 2
dy + 2K + K 1
dy = 4K < .
y2

283L I turn now to the result corresponding to 282O, using a slightly different approach.

Theorem Let f be a complex-valued function which is integrable over R, with Fourier transform f and

inverse Fourier transform f , and suppose that f is of bounded variation on some neighbourhood of x R.
Set a = limtdom f,tx f (t), b = limtdom f,tx f (t). Then
1 R 1 R 1
lim eixy f (y)dy = lim eixy f (y)dy = (a + b).
2 2 2

proof (a) The limits limtdom f,tx f (t) and limtdom f,tx f (t) exist because f is of bounded variation near
x (224F). Recall from 283Db that there is a constant K < such that
R sin cx
|
dx| K
x
whenever and c R.
(b) Let > 0. The hypothesis is that there is some > 0 such that Var[x,x+] (f ) < . Consequently
lim0 Var]x,x+] (f ) = lim0 Var[x,x[ (f ) = 0
(224E). There is therefore an > 0 such that
max(Var[x,x[ (f ), Var]x,x+] (f )) .
Of course
|f (t) f (u)| Var[x,x[ (f )
whenever t, u dom f and x t u < x, so we shall have
|f (t) a| for every t dom f [x , x[,
and similarly
|f (t) b| whenever t dom f ]x, x + ].

(c) Now set


440 Fourier analysis 283L

g1 (t) = f (t) when t dom f and |x t| > , 0 otherwise,

g2 (t) = a when x t < x, b when x < t x + , 0 otherwise,

g3 = f g1 g2 .
Then f = g1 + g2 + g3 ; each gj is integrable; g1 is zero on a neighbourhood of x;
suptdom g3 ,t6=x |g3 (t)| ,

Var[x,x[ (g3 ) , Var]x,x+] (g3 ) .

(d) Consider the three parts g1 , g2 , g3 separately.

(i) For the first, we have


1 R ixy
lim e g 1 (y)dy =0
2

by 283I.

(ii) Next,

Z Z
1 ixy 1 sin(xt)
e g 2 (y)dy = g2 (t)dt
2

xt

(by 283H)
Z x Z x+
a sin(xt) b sin(xt)
= dt + dt
x
xt x
xt
Z Z
a sin u b sin u
= du + du
0
u 0
u

(substituting t = x 1 u in the first integral, t = x + 1 u in the second)


a+b
as
2

by 283Da.

(iii) As for the third, we have, for any > 0,

Z Z Z
1 1 sin(xt) 1 sin t
eixy g 3 (y)dy =

g3 (t)dt = g3 (x t)dt
2

xt
t
Z 0 Z
1 sin t 1 sin t
g3 (x t)dt + g3 (x t)dt
t 0 t

K
sup |g3 (t)| + Var (g3 )
tdom g ]x,x[ ]x,x[
3

+ sup |g3 (t)| + Var (g3 )
tdom g3 ]x,x+[ ]x,x+[

K
4 ,

sin t
using 224J to bound the integrals in terms of the variation and supremum of g3 and integrals of t over
subintervals.

(e) We therefore have


283M Fourier transforms I 441

Z
1 a+b
lim sup eixy f (y)dy
2
2
Z
1
eixy g 1 (y)dy

lim sup
2
Z
1 a+b
+ lim sup eixy g 2 (y)dy

2
2
Z
1
+ lim sup eixy g 3 ydy

2
4K
0+0+

by the calculations in (d). As is arbitrary,


1 R ixy a+b
lim e f (y)dy = 0.
2 2

(f ) This is the first half of the theorem. But of course the second half follows at once, because
Z Z
1 1
lim eixy f (y)dy = lim eixy f (y)dy
2 2
Z
1 a+b
= lim eixy f (y)dy = .
2 2

Remark You will see that this argument uses some of the same ideas as those in 282O-282P. It is more direct
because (i) I am not using any concept corresponding to Fejer sums (though a very suitable one is available;
see 283Xf) (ii) I do not trouble to give the result concerning uniform convergence of the Fourier integrals
when f is continuous and of bounded variation (283Xj) (iii) I do not give any pointer to the significance of

the fact that if f is of bounded variation then supyR |y f (y)| < (283Xk).

283M Corresponding to 282Q, we have the following.


Theorem Let f and g be complex-valued functions which are integrable over R, and f g their convolution
product, defined by setting
R
(f g)(x) = f (t)g(x t)dt
whenever this is defined (255E). Then




(f g) (y) = 2 f (y)g(y), (f g) (y) = 2 f (y)g(y)
for every y R.
proof For any y,

Z
1
(f g) (y) = eiyx (f g)(x)dx
2

Z Z
1
= eiy(t+u) f (t)g(u)dtdu
2
(using 255G)
Z Z
1
eiyt f (t)dt eiyu g(u)du =

= 2 f (y)g(y).
2

Now, of course,




(f g) (y) = (f g) (y) = 2 f (y)g(y) = 2 f (y)g(y).
442 Fourier analysis 283N

283N I show how to compute a special Fourier transform, which will be used repeatedly in the next
section.
2 2
Lemma For > 0, set (x) = 1 ex /2 for x R. Then its Fourier transform and inverse Fourier
2
transform are
1
= = 1/ .


In particular, 1 = 1 .
proof (a) I begin with the special case = 1, using the Maclaurin series
P (iyx)k
eiyx = k=0
k!
R 2
and the expressions for
xk ex /2
dx from 263.
Fix y R. Writing
(iyx)k x2 /2 Pn 2
gk (x) = e , hn (x) = k=0 gk (x), h(x) = e|yx|x /2
,
k!
we see that
|yx|k x2 /2
|gk (x)| e ,
k!
so that
P 2
|hn (x)| k=0 |gk (x)| e|yx| ex /2
= h(x)
for every n; moreover, h is integrable, because |h(x)| e|x| whenever |x| 2(1 + |y|). Consequently, using
Lebesgues Dominated Convergence Theorem,

Z Z
1 1
1 (y) = lim hn (x)dx = lim hn (x)dx
2 n
2 n

1 XZ

1

X (iy)k
Z
2
= gk (x)dx = xk ex /2
dx
2
2 k!
k=0 k=0
X
1 (iy)2j (2j)!
= 2
2 (2j)! 2j j!
j=0
(by 263H)

X
1 (y 2 )j 1 2
= = ey /2
= 1 (y),
2 2j j! 2
j=0

as claimed.
1 x
(b) For the general case, (x) = 1 ( ), so that

1 1
(y) = 1 (y) = 1/ (y)

by 283Ce. Of course we now have
1
(y) = (y) = 1/ (y)

because 1/ is an even function.

283O To lead into the ideas of the next section, I give the following very simple fact.
R
Proposition Let f and g be two complex-valued functions which are integrable over R. Then
f g =
R R
R

f g and f g = f g.
283Wc Fourier transforms I 443

proof Of course
R R R R

|eixy f (x)g(y)|dxdy =
|f |
|g| < ,
so
Z Z Z
1
f (y)eiyx g(x)dxdy

f g =

2
Z Z Z
1
= f (y)eixy g(x)dydx = f g.
2

ixy
For the other half of the proposition, replace every e in the argument by eixy .

283W Higher dimensions I offer a series of exercises designed to provide hints on how the work of
this section may be done in the r-dimensional case, where r 1.

(a) Let f be an integrable complex-valued function defined almost everywhere in R r . Its Fourier trans-

form is the function f : R r C defined by the formula
1 R
f (y) = r eiy . x f (x)dx,
( 2)
R
writing y .x = 1 1 + . . . + r r for x = (1 , . . . , r ) and y = (1 , . . . , r ) R r , and . . . dx for integration

r
with respect to Lebesgue measure on R . Similarly, the inverse Fourier transform of f is the function f
given by
1 R
f (y) = r eiy . x f (x)dx = f (y).
( 2)
Show that, for any integrable complex-valued function f on R r ,

(i) f : R r C is continuous;

(ii) limkyk f (y) = 0, writing kyk = y . y as usual;
R
(iii) if kxk|f (x)|dx < , then f is differentiable, and
i R
f (y) = r eiy . x j f (x)dx
j ( 2)
for j r, y R , always taking j to be the jth coordinate of x R r ;
r
f f
(iv) if j r and j
is defined everywhere and is integrable, and if limkxk f (x) = 0, then ( j
) (y) =

ij f (y) for every y Rr .

(b) Let f be an integrable complex-valued function on R r , and f its Fourier transform. If c d in R r ,
show that
Z Z Yr
i eij j eij j
f = ( )r lim f (y)dy,
[c,d] 2 1 ,... ,r [a,a] j=1 j

setting a = (1 , . . . ), c = (1 , . . . ), d = (1 , . . . ).

(c) Let f be an integrable complex-valued function on R r , and f its Fourier transform. Show that if we
write
B (0, a) = {y : |j | a for every j r},
then
1 R R

B (0,a)
eix . y f (y)dy = a (t)f (x t)dt
( 2)r
for every a 0, where
444 Fourier analysis 283Wc

1 Qr sin aj
a (t) = j=1
r j
for t = (1 , . . . , r ) R r .


(d) Let f and g be integrable complex-valued functions on R r . Show that f g = ( 2)r (f g) .

(e) For > 0, define : Rr C by setting


1 2
(x) = ex . x/2
( 2)r
for every x R r . Show that
1
= = .
r 1/

(f ) Defining as in (e), show that lim0 (f )(x) = f (x) for every continuous integrable f : R r C,
x Rr.

(g) Show that if f : R r C is continuous and integrable, and f is also integrable, then f = f . (Hint:
Show that both are equal at every point to

lim ( 2)r (f ) = lim f 1/ .)

(h) Show that


R 1
R r 1+kxkr+1
dx < .

(i) Show that if f : R r C can be partially differentiated r + 1 times, and f and all its partial derivatives

k f
j1 j2 ...jk are integrable for k r + 1, then f is integrable.



(j) Show that if f and g are integrable complex-valued functions on R r , then (f g) = ( 2)r f g.
R
R
(k) Show that if f and g are integrable complex-valued functions on R r , then f g = f g.

(l) Show that if f1 , . . . , fr are integrable complex-valued functions on R with Fourier transforms g1 , . . . , gr ,
and we write f (x) = f1 (1 ) . . . fr (r ) for x = (1 , . . . , r ) R r , then the Fourier transform of f is y 7
g1 (1 ) . . . gr (r ).
R 2(k+1) sin
R
(m)(i) Show that t dt > 0 for every k N, and hence that sin
t dt > 0.
2k t t 0 t t
p Ra
(ii) Set f1 () = 1/ || for 0 < || 1, 0 for other . Show that lima 1a a f 1 ()d
exists in R
and is greater than 0.
(iii) Construct an integrable function f2 , zero on some neighbourhood of 0, such that there are infinitely
Rm
many m N for which | m f 2 ()d| 1m . (Hint: take f2 () = 2k sin mk for k + 1 < k + 2, for a
sufficiently rapidly increasing sequence hmk ikN .)
(iv) Set f (x) = f1 (1 )f2 (2 ) for x R 2 . Show that f is integrable, that f is zero in a neighbourhood
of 0, but that
1 R
lim supa | B (0,a) f (y)dy| > 0,
2
defining B as in (c).


283X Basic exercises (a) Confirm that the six alternative definitions of the transforms f , f offered
in 283B all lead to the same theory; find the constants involved in the new versions of 283Ch, 283Ci, 283L,
283M and 283N.
283Xl Fourier transforms I 445

R
(b) If we redefined f (y) to be
eixy f (x)dx, what would f (y) be?

(c) Show that nearly every 2 would disappear from the theorems of this section if we defined a measure
on R by saying that E = 12 E for every Lebesgue measurable set E, where is Lebesgue measure,
and wrote
R R
f (y) = eiyx f (x)(dx), f (y) = eiyx f (x)(dx),
R
(f g)(x) =
f (t)g(x t)(dt).
Ra sin t
What is lima a t (dt)?


> (d) Let f be an integrable complex-valued function on R, with Fourier transform f . Show that (i) if


g(x) = f (x) whenever this is defined, then g(y) = f (y) for every y R; (ii) if g(x) = f (x) whenever this


is defined, then g(y) = f (y) for every y.

(e) Let f be an integrable complex-valued function on R, with Fourier transform f . Show that
Rd i R eidx eicx
c
f (y)dy =
f (x)dx
2 x
whenever c d in R.

> (f ) For an integrable complex-valued function f on R, let its Fejer integrals be


1 R c R a
c (x) = 0 a
eixy f (y)dy da
c 2
for c > 0. Show that
1 R 1cos ct
c (x) =
f (x t)dt.
ct2

R 1cos at
(g) Show that
dt = for every a > 0. (Hint: integrate by parts and use 283Da.) Show that
at2
R 1cos at 1cos at
lima
dt = lima supt =0
at2 at2
for every > 0.

(h) Let f be an integrable complex-valued function on R, and define its Fejer integrals a as in 283Xf
above. Show that if x R, c C are such that
1 R
lim0 0 |f (x + t) + f (x t) 2c|dt = 0,

then lima a (x) = c. (Hint: adapt the argument of 282H.)

> (i) Let f be an integrable complex-valued function on R, and define its Fejer integrals a as in 283Xf
above. Show that f (x) = lima a (x) for almost every x R.

(j) Let f : R C be a continuous integrable complex-valued function of bounded variation, and define
its Fejer integrals a as in 283Xf above. Show that f (x) = lima a (x) uniformly for x R.

> (k) Let f be an integrable complex-valued function of bounded variation on R, and f its Fourier

transform. Show that supyR |y f (y)| < .



(l) Let f and g be integrable complex-valued functions on R. Show that f g = 2(f g) .
446 Fourier analysis 283Xm

(m) Let f be an integrable complex-valued function on R, and fix x R. Set


R
fx (y) = f (t) cos y(x t)dt
for y R. Show that
(i) if f is differentiable at x,
1 Ra
f (x) = lima 0
fx (y)dy;

(ii) if there is a neighbourhood of x in which f has bounded variation, then
1 Ra 1
lima 0 fx (y)dy = (limtdom f,t0 f (t) + limtdom f,t0 f (t));
2
R
(iii) if f is twice differentiable and f 0 , f 00 are integrable then fx is integrable and f (x) = 1
0
fx . (The
formula
1 R R
f (x) = 0
f (t) cos y(x t)dt dy,

valid for such functions f , is called Fouriers integral formula.)
(n) Show that if f is a complex-valued function of bounded variation, defined almost everywhere in R,
and converging to 0 at , then
1 Ra
g(y) = lima a eiyx f (x)dx
2
is defined in C for every y 6= 0, and that the limit is uniform in any region bounded away from 0.
(o) Let f be an integrable complex-valued function on R. Set
1 R 1 R
f c (y) =
cos yx f (x)dx, f s (y) =
sin yx f (x)dx
2 2
for y R. Show that
r r
1 R a ixy 2 Ra 2 Ra
e f (y)dy = cos xy f c (y)dy + sin xy f s (y)dy
2 a 0 0

for every x R, a 0.
RaR RRa
(p) Use the fact that 0 0 exy sin y dxdy = 0 0 exy sin y dydx whenever a 0 to show that
R a sin y R 1
lima 0 y dy = 0 1+x2 dx.

(q) Let f : R C be an integrable function which is absolutely continuous on every bounded interval,

and suppose that its derivative f 0 is of bounded variation on R. Show that f is integrable and that f = f .
(Hint: 283Ci, 283Xk.)

2
> (r) Show that if f (x) = e|x| , where > 0, then f (y) =
2( 2 +y 2 )
. Hence, or otherwise, find the
1
Fourier transform of y 7 .
1+y 2

(s) Find the inverse Fourier transform of the characteristic function of a bounded interval in R. Show
that in a formal sense 283F can be regarded as a special case of 283O.

(t) Let f be a non-negative integrable function on R, with Fourier transform f . Show that
Pn Pn

j=0 k=0 aj ak f (yj yk ) 0

whenever y0 , . . . , yn in R and a0 , . . . , an C.
P
(u) Let f be an integrable complex-valued function on R. Show that f(x) = n= f (x+2n) is defined
P R
in C for almost every x. (Hint: n= |f (x + 2n)|dx < .) Show that f is periodic. Show that the

Fourier coefficients of f ], ] are h 1 f (k)ikZ .
2
283 Notes Fourier transforms I 447

283Y Further exercises (a) Show that if f : R C is absolutely continuous in every bounded
interval, f 0 is of bounded variation on R, and limx f (x) = limx f (x) = 0, then
1 Ra i Ra
g(y) = lima a eiyx f (x)dx = lima a eiyx f 0 (x)dx
2 y 2
is defined, with
4
y 2 |g(y)| VarR (f 0 ),
2
for every y 6= 0.
(b) Let f : R [0, [ be an even function such that f is convex on [0, [ (see 233 for notes on convex
functions) and limx f (x) = 0.
R 2k/y
(i) Show that, for any y > 0, k N, 2k/y eiyx f (x)dx 0.
Ra
(ii) Show that g(y) = 12 lima a eiyx f (x)dx exists in [0, [ for every y 6= 0.
(iii) For n N, set fn (x) = e|x|/(n+1) f (x) for every x. Show that fn is integrable and convex on
[0, [.

(iv) Show that g(y) = limn f n (y) for every y 6= 0.
(vi) Show that if f is integrable then
Ra
4 R sin at 4a R /a
a
f= 0
f (t)dt 0
f (t)dt 2 2f (1)
2 t 2

for every a 0. Hence show that whether f is integrable or not, g is integrable and fn = (f n ) for every n.
Ra
(vii) Show that lima0 supnN a f n = 0.

(viii) Show that if f 0 is bounded (on its domain) then {f n : n N} is uniformly integrable (hint: use


(vii) and 283Ya), so that limn kf n gk1 = 0 and f = g.
(ix) Show that if f 0 is unbounded then for every > 0 we can find hR1 , h2 : R [0, [, both even,
convex and converging to 0 at , such that f = h1 + h2 , h01 is bounded, h2 and h2 (1) . Hence

show that in this case also f = g.
(c) Suppose that f : R R is even, twice differentiable and convergent to 0 at , that f 00 is continuous
and that {x : f 00 (x) = 0} is bounded in R. Show that f is the Fourier transform of an integrable function.
(Hint: use 283Yb and 283Xq.)
R
(d) Let g : R R be an odd function of bounded variation such that 1 x1 g(x)dx = . Show that g

cannot be the Fourier transform of any integrable function f . (Hint: show that if g = f then
R1 2i R a 1cos x
0
f = lima 0
g(x)dx = .)
2 x

283 Notes and comments I have tried in this section to give the elementary theory of Fourier transforms
of integrable functions on R, with an eye to the extension of the concept which will be attempted in the next
section. Following 282, I have given prominence to two theorems (283I and 283L) describing conditions
for the inversion of the Fourier
R a transform to return to the original function; we find ourselves P looking at
n
improper integrals lima a , just as earlier we needed to look at symmetric sums limn k=n . I
do not go quite so far as in 282, and in particular I leave the study of square-integrable functions for the
moment, since their Fourier transforms may not be describable by the simple formulae used here.
One of the most fundamental obstacles in the subject is the lack of any effective criteria for determin-
ing which functions are the Fourier transforms of integrable functions. (Happily, things are better for
square-integrable functions; see 284O-284P.) In 283Yb-283Yc I sketch an argument showing that ordinary
non-oscillating even functions which converge to 0 at are Fourier transforms of integrable functions.
1
Strikingly, this is not true of odd functions; thus y 7 is the Fourier transform of an integrable
ln(e+y 2 )
arctan y
function, but y 7 is not (283Yd).
ln(e+y 2 )
448 Fourier analysis 283 Notes

In 283W I sketch the corresponding theory of Fourier transforms in Rr . There are few surprises. One
point to note is that where in the one-dimensional case we ask for a well-behaved second derivative, in
the r-dimensional case we may need to differentiate r + 1 times (283Wi). Another is that we lose the
localization principle. In the one-dimensional case, if f is integrable and zero on an interval ]c, d[, then
Ra
lima a eixy f (y)dy = 0 for every x ]c, d[; this is immediate from either 283I or 283L. But in higher
dimensions the most natural formulation of a corresponding result is false (283Wm).

284 Fourier transforms II


The basic paradox of Fourier transforms is the fact that while for certain functions (see 283J-283K) we

have (f ) = f , ordinary integrable functions f (for instance, the characteristic functions of non-trivial

intervals) give rise to non-integrable Fourier transforms f for which there is no direct definition available

for f , making it a puzzle to decide in what sense the formula f = f might be true. What now seems
by far the most natural resolution of the problem lies in declaring the Fourier transform to be an operation
on distributions rather than on functions. I shall not attempt to describe this theory properly (almost any
book on Distributions will cover the ground better than I can possibly do here), but will try to convey the
fundamental ideas, so far as they are relevant to the questions dealt with here, in language which will make
the transition to a fuller treatment straightforward. At the same time, these methods make it easy to prove
strong versions of the classical theorems concerning Fourier transforms.

284A Test functions: Definition Throughout this section, a rapidly decreasing test function or
Schwartz function will be a function h : R C such that h is smooth, that is, differentiable everywhere
any finite number of times, and moreover
supxR |x|k |h(m) (x)| <
for all k, m N, writing h(m) for the mth derivative of h.

284B The following elementary facts will be useful.


Lemma (a) If g and h are rapidly decreasing test functions, so are g + h and ch, for any c C.
(b) If h is a rapidly decreasing test function and y R, then x 7 h(y x) is a rapidly decreasing test
function.
(c) If h is any rapidly decreasing test function, then h and h2 are integrable.
(d) If h is a rapidly decreasing test function, so is its derivative h0 .
(e) If h is a rapidly decreasing test function, so is the function x 7 xh(x).
2
(f) For any > 0, the function x 7 ex is a rapidly decreasing test function.
proof (a) is trivial.
(b) Write g(x) = h(y x) for x R. Then g (m) (x) = (1)m h(m) (y x) for every m, so g is smooth. For
any k N,
|x|k 2k (|y|k + |y x|k )
for every x, so

sup |x|k |g (m) (x)| = sup |x|k |h(m) (y x)|


xR xR

2k |y|k sup |h(m) (y x)| + 2k sup |y x|k |h(m) (y x)|


xR xR
k k (m)
= 2 |y| sup |h (x)| + 2 sup |x|k |h(m) (x)| < .
k
xR xR

(c) Because
284E Fourier transforms II 449

M = supxR |h(x)| + x2 |h(x)|


is finite, we have
R R M
|h(x)|dx dx < .
1+x2

Of course we now have |h2 | M |h|, so h2 is also integrable.


(d) This is immediate from the definition, as every derivative of h0 is a derivative of h.
(e) Setting g(x) = xh(x), g (m) (x) = xh(m) (x) + mh(m1) (x) for m 1, so
supxR |xk g (m) (x)| supxR |xk+1 h(m) (x)| + m supxR |xk h(m1) (x)|
is finite, for all k N, m 1.
2
(f ) If h(x) = ex , then for each m N we have h(m) (x) = pm (x)h(x), where p0 (x) = 1 and pm+1 (x) =
2
pm (x) 2xpm (x), so that pm is a polynomial. Because ex k+1 x2k+2 /(k + 1)! for all x, k 0,
0

2
lim|x| |x|k h(x) = limx xk /ex = 0
for every k, and lim|x| p(x)h(x) = 0 for every polynomial p; consequently
lim|x| xk h(m) (x) = lim|x| xk pm (x)h(x) = 0
for all k, m, and h is a rapidly decreasing test function.


284C Proposition Let h : R C be a rapidly decreasing test function. Then h : R C and h : R C

are rapidly decreasing test functions, and h = h = h.
R
proof (a) Let k, m N. Then supxR (|x|m + |x|m+2 )|h(k) (x)| < and |xm h(k) (x)|dx < . We may

therefore use 283Ch-283Ci to see that y 7 ik+m y k h(m) (y) is the Fourier transform of x 7 xm h(k) (x), and

therefore that lim|y| y k h(m) (y) = 0, by 283Cg, so that (because h(m) is continuous) supyR |y k h(m) (y)| is

finite. As k and m are arbitrary, h is a rapidly decreasing test function.

(b) Since h(y) = h(y) for every y, it follows at once that h is a rapidly decreasing test function.

(c) By 283J, it follows from (a) and (b) that h = h = h.

284D Definition I will use the phrase tempered function on R to mean a measurable complex-valued
function f , defined almost everywhere on R, such that
R 1
1+|x|k
|f (x)|dx <

for some k N.

284E As in 284B I spell out some elementary facts.


Lemma (a) If f and g are tempered functions, so are |f |, f + g and cf , for any c C.
(b) If f is a tempered function then it is integrable over any bounded interval.
(c) If f is a tempered function and x R, then t 7 f (x+t) and t 7 f (xt) are both tempered functions.
proof (a) is elementary; if
R 1 R 1
1+|x|j
f (x)dx < , 1+|x|k
g(x)dx < ,

then
R 1
1+|x|j+k
|(f + g)(x)| <

because
450 Fourier analysis 284E

1
1 + |x|j+k max(1, |x|j+k ) max(1, |x|j , |x|k ) max(1 + |x|j , 1 + |x|k )
2
for all x.
(b) If
R 1
1+|x|k
|f (x)|dx = M < ,

then for any a b


Rb
a
|f (x)|dx M (1 + |a|k + |b|k )(b a) < .

(c) The idea is the same as in 284Bb. If k N is such that


R 1
1+|t|k
|f (t)|dt = M < ,

then we have
1 + |x + t|k 2k (1 + |x|k )(1 + |t|k )
so that
1 1
2k (1 + |x|k )
1+|t|k 1+|x+t|k
for every t, and
R |f (x+t)| R |f (x+t)|
1+|t|k
dt 2k (1 + |x|k ) 1+|x+t|k
dt 2k (1 + |x|k )M < .

Similarly,
R |f (xt)|
1+|t|k
dt 2k (1 + |x|k )M < .

284F Linking the two concepts, we have the following.


Lemma Let f be a tempered function on R and h a rapidly decreasing test function. Then f h is
integrable.
R 1
proof Of course f h is measurable. Let k N be such that 1+|x| k |f (x)|dx < . There is a M such
k
that (1 + |x| )|h(x)| M for every x R, so that
R R 1

|f (x)h(x)|dx M 1+|x|k
|f (x)|dx < .

R R
284G Lemma Suppose that f1 and f2 are tempered functions and that f1 h = f2 h for every
rapidly decreasing test function h. Then f1 =a.e. f2 .
R
proof (a) Set g = f1 f2 ; then g h = 0 for every rapidly decreasing test function h. Of course g is a
tempered function, so is integrable over any bounded interval. By 222D, it will be enough if I can show that
Rb
a
g = 0 whenever a < b, since then we shall have g = 0 a.e. on every bounded interval and f1 =a.e. f2 .
(b) Consider the function 0 (x) = e1/x for x > 0. Then 0 is differentiable arbitrarily often everywhere
(m)
in ]0, [, 0 < 0 (x) < 1 for every x > 0, and limx 0 (x) = 1. Moreover, writing 0 for the mth
derivative of 0 ,
(m) 1 (m)
limx0 0 (x) = limx0 0 (x) = 0
x
(m)
for every m N. P P (Compare 284Bf.) We have 0 (x) = pm ( x1 )0 (x), where p0 (t) = 1 and pm+1 (t) =
2 0
t (pm (t) pm (t)), so that pm is a polynomial for each m N. Now for any k N,
(k+1)!tk
limt tk et limt = 0,
tk+1
284Id Fourier transforms II 451

so
(m)
limx0 0 (x) = limt pm (t)et = 0,

1 (m)
limx0 0 (x) = limt tpm (t)et = 0. Q
Q
x

(c) Consequently, setting (x) = 0 for x 0, e1/x for x > 0, is smooth, with mth derivative
(m)
(m) (x) = 0 for x 0, (m) (x) = 0 (x) for x > 0.
(The proof is an easy induction on m.) Also 0 (x) 1 for every x R, and limx (x) = 1.
(d) Now take any a < b, and for n N set
n (x) = (n(x a))(n(b x)).
Then n will be smooth and n (x) = 0 if x
/ ]a, b[, so surely n is a rapidly decreasing test function, and
R

g n = 0.
Next, 0 n (x) 1 for every x, n, and if a < x < b then limn n (x) = 1. So
Rb R R R
a
g = g (]a, b[) = g (limn n ) = limn g n = 0,
using Lebesgues Dominated Convergence Theorem. As a and b are arbitrary, g =a.e. 0, as required.

284H Definition Let f and g be tempered functions in the sense of 284D. Then I will say that g
represents the Fourier transform of f if
R R


g h = f h
for every rapidly decreasing test function h.

284I Remarks (a) As usual, when shifting definitions in this way, we have some checking to do. If f is

an integrable complex-valued function on R, f its Fourier transform, then surely f is a tempered function,
R R
being a bounded continuous function; and if h is any rapidly decreasing test function, then f h = f h

by 283O. Thus f represents the Fourier transform of f in the sense of 284H above.

(b) Note also that 284G assures us that if g1 , g2 are two tempered functions both representing the Fourier
transform of f , then g1 =a.e. g2 , since we must have
R R R
g1 h = f h = g2 h
for every rapidly decreasing test function h.

(c) Of course the value of this indirect approach is that we can assign Fourier transforms, in a sense, to
many more functions. But we must note at once that if g represents the Fourier transform of f then so
will any function equal almost everywhere to g; we can no longer expect to be able to speak of the Fourier
transform of f as a function. We could say that the Fourier transform of f is a functional on the space
R
of rapidly decreasing test functions, defined by setting (h) = f h; alternatively, we could say that the
Fourier transform of f is a member of L0C , the space of equivalence classes of almost-everywhere-defined
measurable functions (241).

(d) It is now natural to say that g represents the inverse Fourier transform of f just when f
R R
represents the Fourier transform of g; that is, when f h = g h for every rapidly decreasing test
R R
function h. Because h = h = h for every such h, this is the same thing as saying that f h = g h
for every rapidly decreasing test function h, which is the other natural expression of what it might mean to
say that g represents the inverse Fourier transform of f .
452 Fourier analysis 284Ie

(e) If f , g are tempered functions and we write g (x) = g(x) whenever this is defined, then g will also
be a tempered function, and we shall always have
R R R R
g h = g(x)h(x)(x)dx = g(x)h(x)dx = g h,
so that
g represents the Fourier transform of f
R R
g h = f h for every test function h
R R
g h = f h for every h
R R
g h = f h for every h
g represents the inverse Fourier transform of f .
Combining this with (d), we get
g represents the Fourier transform of f
f = f represents the inverse Fourier transform of g
f represents the Fourier transform of g.

(f ) Yet again, I ought to spell out the check: if f is integrable and f is its inverse Fourier transform as
defined in 283Ab, then
R R R
f h= f h = f h

for every rapidly decreasing test function h, so f represents the inverse Fourier transform of f in the sense
given here.

284J Lemma Let f be any tempered function and h a rapidly decreasing test function. Then f h,
defined by the formula
R
(f h)(y) = f (t)h(y t)dt,
is defined everywhere.
proof Take any y R. By 284Bb, t 7 h(y t) is a rapidly decreasing test function, so the integral is
always defined in C, by 284F.

284K Proposition Let f and g be tempered functions such that g represents the Fourier transform of
f , and h a rapidly decreasing test function.

(a) The Fourier transform of the integrable function f h is 12 g h, where g h is the convolution of

g and h.

(b) The Fourier transform of the continuous function f h is represented by the product 2g h.

proof (a) Of course f h is integrable, by 284F, while g h is defined everywhere, by 284C and 284J.

Fix y R. Set (x) = h(y x) for x R; then is a rapidly decreasing test function because h is
(284Bb). Now
Z Z
1 itx
1
(t) = e h(y x)dx = eit(yx) h(x)dx
2 2
Z
1 ity
= e eitx h(x)dx = eity h (t) = eity h(t),
2
using 284C. Accordingly
Z
1
(f h) (y) = eity f (t)h(t)dt
2
Z Z
1 1
= f (t)(t)dt = g(t)(t)dt
2 2
(because g represents the Fourier transform of f )
284L Fourier transforms II 453
Z
1 1
= g(t)h(y t)dt = (g h)(y).
2 2

1
As y is arbitrary, g h is the Fourier transform of f h.
2

(b) Write for the Fourier transform of g h, f (x) = f (x) when this is defined, and h (x) = h(x)
for every x, so that f represents the Fourier transform of g, by 284Ie, and h is the Fourier transform of
1
h. By (a), we have = f h . This means that the inverse Fourier transform of 2g h must be
2

2 = (f h ) ; and as

(f h ) (y) = (f h )(y)
Z
= f (t)h (y t)dt

Z
= f (t)h(y + t)dt

Z
= f (t)h(y t)dt = (f h)(y),


the inverse Fourier transform of 2g h is f h (which is therefore continuous), and 2g h must
represent the Fourier transform of f h.
Remark Compare 283M. It is typical of the theory of Fourier transforms that we have formulae valid in a
wide variety of contexts, each requiring a different interpretation and a different proof.

284L We are now ready for a result corresponding to 282H. I use a different method, or at least a
different arrangement of the ideas, through the following fact, which is important in other ways.
2 2
Proposition Let f be any tempered function. Writing (x) = 1 ex /2 for x R and > 0, then
2

lim0 (f )(x) = c
whenever x R and c C are such that
1 R
lim0 0 |f (x + t) + f (x t) 2c|dt = 0.

proof (a) By 284Bf, every R is a rapidly decreasing test function, so that f is defined everywhere, by

284J. We need to know that = 1; this is because (substituting u = x/)
R 1 R 2
(x)dx =

eu /2
du = 1,
2
by 263G. The argument now follows the lines of 282H. Set
(t) = |f (x + t) + f (x t) 2c|
Rt
when this is defined, which is almost everywhere, and (t) = 0 , defined for all t 0 because f is integrable
over every bounded interval (284Eb). We have

Z Z
|(f )(x) c| = | f (x t) (t)dt c (t)dt|

Z 0 Z
=| (f (x t) c) (t)dt + (f (x t) c) (t)dt|
0
Z Z
=| (f (x + t) c) (t)dt + (f (x t) c) (t)dt|
0 0
(because is an even function)
454 Fourier analysis 284L
Z
=| (f (x + t) + f (x t) 2c) (t)dt|
Z 0 Z
|f (x + t) + f (x t) 2c| (t)dt = (t) (t)dt.
0 0

(b) I should explain why this last integral is finite. Because f is a tempered function, so are the
functions t 7 f (x + t), t 7 f (x t) (284Ec); of course constant functions are tempered, so t 7 (t) =
|f (x + t) + f (x t) 2c| is tempered, and because is a rapidly decreasing test function we may apply
284F to see that the product is integrable.
(c) Let > 0. By hypothesis, lim R t0 (t)/t = 0; let > 0 be such that (t) t for every t [0, ]. Take

any ]0, ]. I break the integral 0 (t) (t)dt up into three parts.
(i) For the integral from 0 to , we have
Z Z
1 1
(t) (t)dt (t)dt = () ,
0 0
2 2 2

because (t) 1 for every t.


2

(ii) For the integral from to , we have

Z Z
1 2 2
(t) (t)dt (t) dt

2
t2
t2 /2 2 t2 /2 2
(because e = 1/e 1/(t /2 ) = 2 2 /t2 for every t 6= 0)
2 2
r Z r Z
2 (t) 2 () () 2(t)
= dt = + dt
2

t 2
t3
(integrating by parts see 225F)
Z
2
+ dt

t2
p
(because (t) t for 0 t and 2/ 1)
2
+ 3.

(iii) For the integral from to , we have


Z Z 2
/2 2
1 et
(t) (t) = (t) dt.
2
Now for any t ,
1 2
/2 2
7 et : ]0, ] R

is monotonically increasing, because its derivative
d 1 t2 /2 2 1 t2 2 2
e = 1 et /2
d 2 2
is positive, and
1 2
/2 2 2 2
lim0 et = lima aea t /2
= 0.

So we may apply Lebesgues Dominated Convergence Theorem to see that
Z 2 2
et /2n
lim (t) dt = 0
n n
284M Fourier transforms II 455

whenever hn inN is a sequence in ]0, ] converging to 0, so that


Z 2
/2 2
et
lim (t) dt = 0.
0
There must therefore be a 0 ]0, ] such that
R

(t) (t)dt
for every 0 .
Putting these together, we see that
R
|(f )(x) c| 0
(t) (t)dt + 3 + = 5
whenever 0 < 0 . As is arbitrary, lim0 (f )(x) = c, as claimed.

284M Theorem Let f and g be tempered functions such that g represents the Fourier transform of f .
Then
1 R iyx x2
(a)(i) g(y) = lim0
e e f (x)dx for almost every y R.
2
(ii) If y R is such that a = limtdom g,ty g(t) and b = limtdom g,ty g(t) are both defined in C, then
1 R iyx x2 1
lim0
e e f (x)dx = (a + b).
2 2
1 R ixy y 2
(b)(i) f (x) = lim0 e e g(y)dy for almost every x R.
2
(ii) If x R is such that a = limtdom f,tx f (t) and b = limtdom f,tx f (t) are both defined in C, then
1 R ixy y 2 1
lim0
e e g(y)dy = (a + b).
2 2

proof (a)(i) By 223D,


1 R
lim0 |g(y + t) g(y)|dt = 0
2
for almost every y R, because g is integrable over any bounded interval. Fix any such y. Set (t) =
|g(y + t) + g(y t) 2g(y)| whenever this is defined. Then, as in 282Ia,
R R
0
|g(y + t) g(y)|dt,
R
so lim0 1 0 = 0. Consequently, by 284L,
g(y) = lim (g 1/ )(y).
We know from 283N that the Fourier transform of is 1 1/ for any > 0. Accordingly, by 284K, g 1/

is the Fourier transform of 2f , that is,
R
(g 1/ )(y) = eiyx (x)f (x)dx.
So
Z
g(y) = lim eiyx (x)f (x)dx

Z
1 2 2
= lim eiyx ex /2 f (x)dx
2
Z
1 2
= lim eiyx ex f (x)dx.
0 2

And this is true for almost every y.


(ii) Again, setting c = 21 (a + b), (t) = |g(y + t) + g(y t) 2c| whenever this is defined, we have
R
limtdom ,t0 (t) = 0, so of course lim0 1 0 = 0, and
456 Fourier analysis 284M

Z
1 2
c = lim (g 1/ )(y) = lim eiyx ex f (x)dx
0 2

as before.
(b) This can be shown by similar arguments; or it may be actually deduced from (a), by observing that
x 7 f (x) = f (x) represents the Fourier transform of g (see 284Id), and applying (a) to g and f .

284N L2 spaces We are now ready for results corresponding to 282J-282K.


Lemma Let L2C be the space of square-integrable complex-valued functions on R, and S the space of rapidly
decreasing test functions. Then for every f L2C and > 0 there is an h S such that kf hk2 .
proof Set (x) = e1/x for x > 0, zero for x 0; recall from the proof of 284G that is smooth. For any
a < b, the functions
x 7 n (x) = (n(x a))(n(b x))
provide a sequence of test functions converging to ]a, b[ from below, so (as in 284G)
Rb
inf hS k ]a, b[ hk22 limn a |1 n (x)|2 dx = 0.
Because S is a linear space (284Ba), it follows that for every step-function g with bounded support and
every > 0 there is an h S such that kg hk2 12 . But we know from 244H that for every f L2C and
> 0 there is a step-function g with bounded support such that kf gk2 21 ; so there must be an h S
such that
kf hk2 kf gk2 + kg hk2 .
As f and are arbitrary, we have the result.

284O Theorem (a) Let f be any complex-valued function which is square-integrable over R. Then f
is a tempered function and its Fourier transform is represented by another square-integrable function g, and
kgk2 = kf k2 .
(b) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms repre-
sented by functions g1 , g2 , then
R R
1
f (x)f2 (x)dx = g1 (y)g2 (y)dy.
(c) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms repre-
sented by functions g1 , g2 , then the integrable function f1 f2 has Fourier transform 12 g1 g2 .
(d) If f1 and f2 are complex-valued
functions, square-integrable over R, with Fourier transforms repre-
sented by functions g1 , g2 , then 2g1 g2 represents the Fourier transform of the continuous function
f1 f2 .
proof (a)(i) Consider first the case in which f is a rapidly decreasing test function and g is its Fourier
transform; we know that g is also a rapidly decreasing test function, and that f is the inverse Fourier
transform of g (284C). Now the complex conjugate g of g is given by the formula
1 R 1 R iyx
g(y) = e iyx f (x)dx = e f (x)dx,
2 2

so that g is the inverse Fourier transform of f . Accordingly


R R R R
f f = g f = g f = g g,
using 283O for the middle equality.
(ii) Now suppose that f L2C . I said that f is a tempered function; this is simply because
R 1 2
1+|x|
dx < ,
so
284O Fourier transforms II 457

R |f (x)|
1+|x|
dx <

(244Eb). By 284N, there is a sequence hfn inN of rapidly decreasing test functions such that limn kf
fn k2 = 0. By (i),

limm,n kf m f n k2 = limm,n kfm fn k2 = 0,

and the sequence hf n inN of equivalence classes is a Cauchy sequence in L2C . Because L2C is complete (244G),

hf n inN has a limit in L2C , which is representable as g for some g L2C . Like f , g must be a tempered
function. Of course

kgk2 = limn kf n k2 = limn kfn k2 = kf k2 .
Now if h is any rapidly decreasing test function, h L2C (284Bc), so we shall have
R R R R
g h = limn f n h = limn fn h = f h.
So g represents the Fourier transform of f .
(b) Of course any functions representing the Fourier transforms of f1 and f2 must be equal almost
everywhere to square-integrable functions, and therefore square-integrable, with the right norms. It follows
as in 282K (part (d) of the proof) that if g1 , g2 represent the Fourier transforms of f1 , f2 , so that ag1 + bg2
represents the Fourier transform of af1 + bf2 and kag1 + bg2 k2 = kaf1 + bf2 k2 for all a, b C, we must have
R R
f1 f 2 = (f1 |f2 ) = (g1 |g2 ) = g1 g 2 .

(c) Of course f1 f2 is integrable because it is the product of two square-integrable functions (244E).
(i) Let y R and set f (x) = f2 (x)eiyx for x R. Then f L2C . We need to know that the Fourier
transform of f is represented by g, where g(u) = g2 (y u). PP Let h be a rapidly decreasing test function.
Then
Z Z Z
g h = g2 (y u)h(u)du = g2 (u)h(y u)du
Z Z

= g2 h1 = f2 h1 ,

where h1 (u) = h(y u). To compute h1 , we have
Z Z
1 1
h1 (v) = eivu h1 (u)du = eivu h(y u)du
2 2
Z Z
1 1
= eivu h(y u)du = eiv(yu) h(u)du = eivy h(v).
2
2
So
Z Z Z

gh= f 2 h1 =
f2 (v)h1 (v)dv
Z Z

ivy
= f2 (v)e h(v)dv = f h :
as h is arbitrary, g represents the Fourier transform of f . Q
Q
(ii) We now have
Z
1
(f1 f2 ) (y) = eiyx f1 (x)f2 (x)dx
2
Z Z
1 1
= f1 (x)f (x)dx = g1 (u)g(u)du
2 2
(using part (b))
458 Fourier analysis 284O
Z
1 1
= g1 (u)g2 (y u)du = (g1 g2 )(y).
2 2

As y is arbitrary, (f1 f2 ) = 1 g1 g2 , as claimed.


2

(d) By (c), the Fourier transform of 2g1 g2 is f1 f2 , writing
f1 (x) = f1 (x), so that f1 represents
the Fourier transform of g1 . So the inverse Fourier transform of 2g1 g2 is (f1 f2 )

. But, just as in

the
proof of 284Kb, (f1 f 2 ) = f 1 f 2 , so f 1 f2 is the inverse Fourier transform of 2g1 g2 , and
2g1 g2 represents the Fourier transform of f1 f2 , as claimed. Also f1 f2 , being the Fourier transform
of an integrable function, is continuous (283Cf; see also 255K).

284P Corollary Writing L2C for the Hilbert space of equivalence classes of square-integrable complex-
valued functions on R, we have a linear isometry T : L2C L2C given by saying that T (f ) = g whenever f ,
g L2C and g represents the Fourier transform of f .

284Q Remarks (a) 284P corresponds, of course, to 282K, where the similar isometry between `2C (Z)
and L2C (], ]) is described. In that case there was a marked asymmetry which is absent from the present
situation; because the relevant measure on Z, counting measure, gives non-zero mass to every point, members
of `2C are true functions, and it is not surprising that we have a straightforward formula for S(f ) `2C for
every f L2C (], ]). The difficulty of describing S 1 : `2C (Z) L2C (], ]) is very similar to the difficulty
of describing T : L2C (R) L2C (R) and its inverse. 284Yg and 286U-286V show just how close this similarity
is.

(b) I have spelt out parts (c) and (d) of 284O in detail, perhaps in unnecessary detail, because they give
me an opportunity to insist on the difference between 2g1 g2 represents the Fourier transform of f1 f2
and 12 g1 g2 is the Fourier transform of f1 f2 . The actual functions g1 and g2 are not well-defined by
the hypothesis that they represent the Fourier transforms of f1 and f2 , though their equivalence classes g1 ,
g2 L2C are. So the product g1 g2 is also not uniquely defined as a function, though its equivalence class
(g1 g2 ) = g1 g2 is well-defined as a member of L1C . However the continuous function g1 g2 is unaffected
by changes to g1 and g2 on negligible sets, so is well defined as a function; and since f1 f2 is integrable,
and has a true Fourier transform, it is to be expected that (f1 f2 ) should be exactly equal to 12 g1 g2 .

(c) Of course 284Oc-284Od also exhibit a characteristic feature of arguments involving Fourier transforms,
the extension by continuity of relations valid for test functions.

(d) 284Oa is a version of Plancherels theorem. The formula kf k2 = kf k2 is Parsevals identity.

284R Diracs delta function Consider the tempered function 1 with constant value 1. In what sense,
if any, can we assign a Fourier transform to 1?
R
If we examine 1 h, as suggested in 284H, we get
R R

1 h = h = 2 h (0) = 2h(0)
R
for every rapidly decreasing test function h. Of course there is no function g such that g h = 2h(0)
for every rapidly decreasing test function h, since (using the arguments of 284G) we should have to have
Rb
a
g = 2 whenever a < 0 < b, so that the indefinite integral of g could not be continuous at 0. However
there is a measure on R with exactly the right property. If we set 0 E = 1 when 0 E R, 0 when
0R / E R, then 0 is a (Radon) probability measure; it is indeed a distribution in the sense of 271C, and
h d0 = h(0) for every function h defined at 0. So we shall have
R R

1 h = 2 h d0

for every rapidly decreasing test function h, and we can reasonably say that the measure = 20
represents the Fourier transform of 1.
We note with pleasure at this point that
284We Fourier transforms II 459

1 R
eixy (dy) = 1
2
for every x R, so that 1 can be called the inverse Fourier transform of .
If we look at the formulae of Theorem 284M, we get ideas consistent with this pairing of 1 with . We
have
1 R iyx x2 1 R iyx x2 1 2
e e 1(x)dx = e e dx = ey /4
2 2 2

for every y R, using 283N with = 1/ 2. So
1 R iyx x2
lim0
e e 1(x)dx = 0
2
for every y 6= 0, and the Fourier transform of 1 should be zero everywhere except at 0. On the other
1 2
hand, the functions y 7 ey /4 all have integral 2, concentrated more and more closely about 0 as
2

decreases to 0, so also point us directly to , the measure which gives mass 2 to 0.
Thus allowing measures, as well as functions, enables us to extend the notion of Fourier transform. Of
course we can go very much farther than this. If h is any rapidly decreasing test function, then
R

xh(x)dx = i 2h0 (0),

so that the identity function x 7 x can be assigned, as a Fourier transform, the operator h 7 i 2h0 (0).
At this point we are entering the true theory of (Schwartzian) distributions or generalized functions,
and I had better stop. The Dirac delta function is most naturally regarded as the measure 0 above;
1
alternatively, as 1.
2

284W The multidimensional case As in 283, I give exercises designed to point the way to the
r-dimensional generalization.
(a) A rapidly decreasing test function on R r is a function h : R r C such that (i) h is smooth,
that is, all repeated partial derivatives
mh
j1 ...jm
are defined and continuous everywhere in R r (ii)
mh
supxR r kxkk |h(x)| < , supxRr kxkk | (x)| <
j1 ...jm
for every k N, j1 , . . . , jm r. A tempered function on R r is a measurable complex-valued function f ,
defined almost everywhere on R r , such that, for some k N,
R 1
R r 1+kxkk
|f (x)|dx < .

Show that if f is a tempered function on R r and h is a rapidly decreasing test function on R r then f h is
integrable.

(b) Show that if h is a rapidly decreasing test function on R r so is h, and that in this case h = h.
R
(c) Show that if f is a tempered function on Rr and f h = 0 for every rapidly decreasing test function
h on R r , then f =a.e. 0.
(d) If f and g are tempered functions on Rr , I say that g represents the Fourier transform of f if
R R
g h = f h for every rapidly decreasing test function h on R r . Show that if f is integrable then f
represents the Fourier transform of f in this sense.
2
(e) Let f be any tempered function on R r . Writing (x) = 1 ex . x/2 for x R r , show that
( 2)r R
r
lim0 (f )(x) = c whenever x R , c C are such that lim0 1r B(x,) |f (t) c|dt = 0, writing
B(x, ) = {t : kt xk }.
460 Fourier analysis 284Wf

(f ) Let f and g be tempered functions on R r such that g represents the Fourier transform of f , and h a
1

rapidly decreasing test function. Show that (i) the Fourier transform of f h is (2) r
g h (ii) ( 2)r g h
represents the Fourier transform of f h.

(g) Let f and g be tempered functions on R r such that g represents the Fourier transform of f . Show
that
1 R
g(y) = lim0 Rr
eiy . x ex . x f (x)dx
( 2)r
for almost every y R r .

(h) Show that for any square-integrable complex-valued function f on R r and any > 0 there is a rapidly
decreasing test function h such that kf hk2 .

(i) Let L2C be the space of square-integrable complex-valued functions on R r . Show that
(i) for every f L2C there is a g L2C which represents the Fourier transform of f , and in this case
kgk2 = kf k2 ;
1
(ii) if g1 , g2 L2C represent the Fourier transforms of f1 , f2 L2C , then (2) g g2 is the Fourier
r 1
r
transform of f1 f2 , and ( 2) g1 g2 represents the Fourier transform of f1 f2 .

284X Basic exercises (a) Show that if g and h are rapidly decreasing test functions, so is g h.

(b) Show that there are non-zero continuous integrable functions f , g : R C such that f g = 0
everywhere. (Hint: take them to be Fourier transforms of suitable test functions.)

(c) Suppose that f : R C is a differentiable function such that its derivative f 0 is a tempered function
and, for some k N,
limx xk f (x) = limx xk f (x) = 0.
R R
(i) Show that f h0 = f 0 h for every rapidly decreasing test function h. (ii) Show that if g is a
tempered function representing the Fourier transform of f , then y 7 iyg(y) represents the Fourier transform
of f 0 .

(d) Show that if h is a rapidly decreasingR test function and f is any measurable complex-valued function,

defined almost everywhere on R, such that |x|k |f (x)|dx < for every k N, then the convolution f h
is a rapidly decreasing test function. (Hint: show that the Fourier transform of f h is a test function.)
Ra
> (e) Let f be a tempered function such that lima a f (x)dx exists in C. Show that this limit
R 2
is also equal to lim0 ex f (x)dx. (Hint: set g(x) = f (x) + f (x). Use 224J to show that if
Rb 2 Rc Ra 2
0 a b then | a g(x)ex dx| supc[a,b] | a g|, so that lima 0 g(x)ex dx exists uniformly in ,
Ra 2 Ra
while lim0 0 g(x)ex dx = 0 g(x)dx for every a 0.)

> (f ) Let f and g be tempered functions on R such that g represents the Fourier transform of f . Show
that
1 R a iyx
g(y) = lima a
e f (x)dx
2

at almost all points y for which the limit exists. (Hint: 284Xe, 284M.)

> (g) Let f be an integrable complex-valued function on R such that f is also integrable. Show that

f = f at any point at which f is continuous.

(h) Show that for every p [1, [, f LpC and > 0 there is a rapidly decreasing test function h such
that kf hkp .
284Xp Fourier transforms II 461

> (i) Let f and g be square-integrable complex-valued functions on R such that g represents the Fourier
transform of f . Show that
Rd i R eicy eidy
c
f (x)dx =
g(y)dy
2 y
whenever c < d in R.
R
(j) Let f be a measurable complex-valued function, defined almost everywhere in R, such that |f |p < ,
where 1 < p 2. Show that f is a tempered function and that there is a tempered function g representing
the Fourier transform of f . (Hint: express f as f1 + f2 , where f1 is integrable and f2 is square-integrable.)
(Remark Defining kf kp , kgkq as in 244D, where q = p/(p 1), we have kgkq (2)(p2)/2p kf kp ; see
Zygmund 59, XVI.3.2.)

(k) Let f , g be square-integrable complex-valued functions on R such that g represents the Fourier
transform of f .
(i) Show that
1 Ra 1 R sin at

a
eixy g(y)dy =
f (x t)dt
2 t

for every x R, a > 0. (Hint: find the inverse Fourier transform of y 7 eixy [a, a](y), and use 284Ob.)
(ii) Show that if f (x) = 0 for x ]c, d[ then
1 Ra
lima a eixy g(y)dy = 0
2

for x ]c, d[.


(iii) Show that if f is differentiable at x R, then
1 Ra
lima a eixy g(y)dy = f (x).
2

(iv) Show that if f has bounded variation over some interval properly containing x, then
1 Ra 1
lima a eixy g(y)dy = (limtdom f,tx f (t) + limtdom f,tx f (t)).
2 2


(l) Let f be an integrable complex function on R. Show that if f is square-integrable, so is f .

(m) Let f1 , f2 be square-integrable


R complex-valued
R functions on R with Fourier transforms represented
by g1 , g2 . Show that f1 (t)f2 (t)dt = g1 (t)g2 (t)dt.

(n) Write x for the measure


on R which assigns a mass of 1 to the point x and a mass of 0 to the rest
of R. Describe a sense in which 2x can be regarded as the Fourier transform of the function t 7 eixt .

(o) For any tempered function f and x R, define x as in 284Xn, and set
R
(x f )(u) = f (u t)x (dt) = f (u x)
for every u for which u x dom f (cf. 257Xe). If g represents the Fourier transform of f , find a corre-
sponding representation of the Fourier transform of x f , and relate it to the product of g with the Fourier
transform of x .

(p) Show that


R 1 Ra1
lim0,a a x
eiyx dx + x
eiyx dx = i sgn y

for every y R, writing sgn y = y/|y| if y 6= 0 and sgn 0 = 0. (Hint: 283Da.)


(ii) Show that
1 RcRa 2i
limc 0 a
eixy sgn y dy da =
c x
462 Fourier analysis 284Xp

for every x 6= 0.
(iii) Show that for any rapidly decreasing test function h,
Z Z Z
a
1 1 1
(h(x) h(x))dx = lim h(x)dx + h(x)dx
0 x 0,a a x x
Z
i
= h(y) sgn y dy.
2
(iv) Show that for any rapidly decreasing test function h,
i R R1


h(x) sgn x dx = 0
(h(y) h(y))dy.
2 y

R
(q) Let hhn inN be a sequence of rapidly decreasing test functions such that (f ) = limn hn f
R R
is defined for every rapidly decreasing test function f . Show that limn h0n f , limn hn f
R
and limn (hn g) f are defined for all rapidly decreasing test functions f and g, and are zero if
is identically zero. (Hint: 255G will help with the last.)

284Y Further exercises (a) Let f be an integrable complex-valued function on ], ], and f its
periodic extension, as in 282Ae. Show that f is a tempered function. Show that for any rapidly decreasing
R P
test function h, f h = 2 ck h(k), where hck ikN is the sequence of Fourier coefficients of f .
k=
(Hint: begin with the case f (x) = einx . Next show that
P P
M = k= |h(k)| + k= supx[(2k1),(2k+1)] |h(x)| < ,
and that
R P
| f h 2 k= ck h(k)| M kf k1 .
Finally apply 282Ib.)
(b) Let f be a complex-valued function, defined almost everywhere on R, such that f h is integrable
for every rapidly decreasing test function h. Show that f is tempered.
(c) Let f and g be tempered functions on R such that g represents the Fourier transform of f . Show that
Z d Z
i eicy eidy y 2 /2 2
f (x)dx = lim e g(y)dy
c
2 y
R
whenever c d in R. (Hint: set = [c, d]. Show that both sides are lim f ( 1/ ), defining
as in 283N.)
R
(d) Show that if g : R R is an odd function of bounded variation such that 1 x1 g(x)dx = , then g
does not represent the Fourier transform of any tempered function. (Hint: 283Yd, 284Yc.)
(e) Let S be the space of rapidly decreasing test functions. For k, m N set km (h) = supxR |x|k |h(m) (x)|
for every h S, writing h(m) for the mth derivative of h as usual. (i) Show that each km is a seminorm
(2A5D) and that S is complete and separable for the metrizable linear space topology T they define (2A5B).

(ii) Show
R that h 7 h : S S is continuous for T. (iii) Show that if f is any tempered
R function, then
h 7 f h is T-continuous. (iv) Show that if f is an integrable function such that |xk f (x)|dx < for
every k N, then h 7 f h : S S is T-continuous.
(f ) Show that if f is a tempered function on R and
1 RcRa
= limc 0 a f (x)dxda
c
is defined in C, then is also
R
lim0
f (x)e|x| dx.
284 Notes Fourier transforms II 463

(g) Let f , g be square-integrable complex-valued functions on R such that g represents the Fourier
transform of f . Suppose that m Z and that (2m 1) < x < (2m + 1). Set f(t) = f (t + 2m) for those
t ], ] such that t + 2m dom f . Let hck ikZ be the sequence of Fourier coefficients of f. Show that
1 Ra Pn
lima a eixy g(y)dy = limn k=n ck eikx
2
in the sense that if one limit exists in C so does the other, and they are then equal. (Hint: 284Xk(i), 282Da.)

(h) Show that if f is integrable over R and there is some M 0 such that f (x) = f (x) = 0 for |x| M ,
then f =a.e. 0. (Hint: reduce to the case M = . Looking at the Fourier series of f ], ], show that f
Pm
is expressible in the form f (x) = k=m ck eikx for almost every x ], ]. Now compute f (2n + 12 ) for
large n.)
R 1
(i) Let be a Radon measure on R (definition: 256A) which is tempered in the sense that 1+|x|k
(dx)
is finite for some k N. (i) Show that every rapidly decreasing test function is -integrable. (ii) Show that
if has bounded support (definition: 256Xf), and hR is a rapidly decreasing test function, then h is a

rapidly decreasing test function, where ( h)(x) = h(x y)(dy) for x R. (iii) Show that there is
R R
a sequence hhn inN of rapidly decreasing test functions such that limn hn f = f d for every
rapidly decreasing test function f .

(j) Let : S R be a functional defined by the formula of 284Xq. Show that is continuous for the
topology of 284Ye. (Note: it helps to know a little more about metrizable linear topological spaces than is
covered in 2A5.)

284 Notes and comments Yet again I must warn you that the material above gives a very restricted
view of the subject. I have tried to indicate how the theory of Fourier transforms of good functions
here taken to be the rapidly decreasing test functions may be extended, through a kind of duality, to a
very much wider class of functions, the tempered functions. Evidently, writing S for the linear space of
rapidly decreasing test functions, we can seek to investigate a Fourier transform of any linear functional

: S C, writing (h) = (h) for any h S. (It is actually commoner at this point to restrict attention
to functionals which are continuous for the standard topology on S, described in 284Ye; these are called
tempered distributions.) By 284F-284G, we can identify some of these functionals with equivalence classes
of tempered functions, and then set out to investigate those tempered functions whose Fourier transforms
can again be represented by tempered functions.
I suppose the structure of the theory of Fourier transforms is best laid out through the formulae involved.


Our aim is to set up pairs (f, g) = (f, f ) = (g, g) in such a way that we have

Inversion: h = h = h;

Reversal : h(y) = h(y);

Linearity: (h1 + h2 ) = h1 + h2 , (ch) = ch;

Differentiation: (h0 ) (y) = iy h(y);

Shift: if h1 (x) = h(x + c) then h1 (y) = eiyc h(y);

Modulation: if h1 (x) = eicx h(x) then h1 (y) = h(y c);

Symmetry: if h1 (x) = h(x) then h1 (y) = h(y);

Complex Conjugate: (h) (y) = h(y);
1 y
Dilation: if h1 (x) = h(cx), where c > 0, then h1 (y) = h( );
c c
1
Convolution: (h1 h2 ) = 2 h1 h2 , (h1 h2 ) = h1 h2 ;
2
R R
Duality: h1 h2 = h1 h2 ;
R R
Parseval : h1 h2 = h1 h2 ;
464 Fourier analysis 284 Notes

and, of course,

1 R
h(y) =
eiyx h(x)dx,
2

Rd i R eicy eidy
c
h(y)dy =
h(y)dy.
2 y

(I have used the letter h in the list above to suggest what is in fact the case, that all the formulae here are

valid for rapidly decreasing test functions.) On top of all this, it is often important that the operation h 7 h
should be continuous in some sense.
The challenge of the pure theory of Fourier transforms is to find the widest possible variety R of objects h
for which the formulae above will be valid, subject to appropriate interpretations of , and . I must of
course remark here that from the very beginnings, the subject has been enriched by its applications in other
parts of mathematics, the physical sciences and the social sciences, and that again and again these have

suggested further possible pairs (f, f ), making new demands on our power to interpret the rules we seek to
follow. Even the theory of distributions does not seem to give a full canonical account of what can be done.
First, there are great difficulties in interpreting the product of two arbitrary distributions, making several
of the formulae above problematic; and second, it is not obvious that only one kind of distribution need be
considered. In this section I have looked at just one space of test functions, the space S of rapidly decreasing
test functions; but at least two others are significant, the space D of smooth functions with bounded support
and the space Z of Fourier transforms of functions in D. The advantage of starting with S is that it gives
2
a symmetric theory, since h S for every h S; but it is easy to find objects (e.g., the function x 7 ex ,
or the function x 7 1/|x|) which cannot be interpreted as functionals on S, so that their Fourier transforms
must be investigated by other methods, if at all. In 284Xp I sketch some of the arguments which can be
used to justify the assertion
p that the Fourier transform of the function x 7 1/x is, or can be represented
by, the function y 7 i 2 sgn y; the general principle in this case being that we approach both 0 and
symmetrically. For a variety of such matching pairs, established by arguments based on the idea in 284Xq,
see Lighthill 59, chap. 3.
Accordingly it seems that, after nearly two centuries, we must still proceed by carefully examining par-
ticular classes of function, and checking appropriate interpretations of the formulae. In the work above I
have repeatedly used the concepts
Ra R 2
lima a f (x)dx, lim0 ex f (x)dx
R
as alternative interpretations of f . (Of course they are closely related; see 284Xe.) The reasons for using
2
the particular kernel ex are that it belongs to S, it is an even function, its Fourier transform is calculable
2 2
and easy to manipulate, and it is associated with the normal probability density function 12 ex /2 , so
that any miscellaneous facts we gather have a chance of being valuable elsewhere. But there are applications
in which alternative kernels are more manageable e.g., e|x| (283Xr, 283Yb, 284Yf).
One of the guiding principles here is that purely formal manipulations, along the lines of those in the list
above, and (especially) changes in the order of integration, with other exchanges of limit, again and again
give rise to formulae which, suitably interpreted, are valid. First courses in analysis are often inhibitory;
students are taught to distrust any manipulation which they cannot justify. To my own eye, the delight
of this subject lies chiefly in the variety of the arguments demanded by a rigorous approach, the ground
constantly shifting with the context; but there is no doubt that cheerful sanguinity is often the best guide
to the manipulations which it will be right to try to justify.
This being a book on measure theory, I am of course particularly interested in the possibility of a measure
appearing as a Fourier transform. This is what happens if we seek the Fourier transform of the constant
function 1 (284R). More generally, any periodic tempered function f with period 2 can be assigned a Fourier
transform which is a signed measure (for our present purposes, a complex linear combination of measures)
concentrated on Z, the mass at each k Z being determined by the corresponding Fourier coefficient of
f ], ] (284Xn, 284Ya). In the next section I will go farther in this direction, with particular reference
to probability distributions on R r . But the reason why positive measures have not forced themselves on our
attention so far is that we do not expect to get a positive function as a Fourier transform unless some very
285C Characteristic functions 465

special conditions are satisfied, as in 283Yb.


As in 282, I have used the Hilbert space structure of L2C as the basis of the discussion of Fourier transforms
of functions in L2C (284O-284P). But as with Fourier series, Carlesons theorem (286U) provides a more direct
description.

285 Characteristic functions


I come now to one of the most effective applications of Fourier transforms, the use of characteristic func-
tions to analyse probability distributions. It turns out not only that the Fourier transform of a probability
distribution determines the distribution (285M) but that many of the things we want to know about a distri-
bution are easily calculated from its transform (285G, 285Xf). Even more strikingly, pointwise convergence
of Fourier transforms corresponds (for sequences) to convergence for the vague topology in the space of
distributions, so they provide a new and extremely powerful method for proving such results as the Central
Limit Theorem and Poissons theorem (285Q).
As the applications of the ideas here mostly belong to probability theory, I return to probabilists termi-
nology, as in Chapter 27.

285A Definition (a) Let be a Radon probability measure on R r (256A). Then the characteristic
function of is the function : R r C given by the formula
R
(y) = eiy . x (dx)
for every y R r , writing y . x = 1 1 + . . . + r r if y = (1 , . . . , r ), x = (1 , . . . , r ).

(b) Let X1 , . . . , Xr be real-valued random variables on the same probability space. The characteristic
function of X = (X1 , . . . , Xr ) is the characteristic function X = X of their joint probability distribution
X as defined in 271C.

285B Remarks (a) By one of the ordinary accidents of history, the definitions of characteristic function
and Fourier transform have evolved independently. In 283Ba I remarked that the definition of the Fourier
transform remains unfixed, and that the formulae
R
f (y) =
eiyx f (x)dx,


1 R
f (y) =
eiyx f (x)dx
2
are sometimes used. On the other hand, I think that nearly all authors agree on the definition of the
characteristic function as given above. You may feel therefore that I should have followed their lead, and
chosen the definition of Fourier transform which best matches the definition of characteristic function. I did
not do so largely because I wished to emphasise the symmetry between the Fourier transform and the inverse
Fourier transform, and the correspondence between Fourier transforms and Fourier series. The principal
advantage of matching the definitions up would be to make the constants in such theorems as 283F, 285Xh

the same, and would be balanced by the need to remember different constants for f , f in such results as
283M.

(b) A secondary reason for not trying too hard to make the formulae of this section match directly those
of 283-284 is that the r-dimensional case is at the heart of some of the most important applications of
characteristic functions, so that it seems right to introduce it from the beginning; and consequently the
formulae of this section will necessarily have new features compared with those in the body of the work so
far.

285C Of course there is a direct way to describe the characteristic function of a family (X1 , . . . , Xr )
of random variables, as follows.
466 Fourier analysis 285C

Proposition Let X1 , . . . , Xr be real-valued random variables on the same probability space (, , ), and
X their joint distribution. Then their characteristic function X is given by
X
X (y) = E(eiy .X ) = E(ei1 X1 ei2 X2 . . . eir Xr )
for every y = (1 , . . . , r ) R r .
proof Apply 271E to the functions h1 , h2 : Rr R defined by
h1 (x) = cos(y . x), h2 (y) = sin(y .x),
to see that
Z Z
X (y) = h1 (x)X (dx) + i h2 (x)X (dx)
X
= E(h1 (X X )) = E(eiy .X
X )) + iE(h2 (X ).

285D I ought to spell out the correspondence between Fourier transforms, as defined in 283A, and
characteristic functions.
Proposition Let be a Radon probability measure on R. Write
1 R
(y) =
eiyx (dx)
2
for every y R, and for the characteristic function of .
1
(a) (y) = (y) for every y R.
2
(b) For any Lebesgue integrable complex-valued function h defined almost everywhere in R,
R
R


(y)h(y)dy =
h(x)(dx).
(c) For any rapidly decreasing test function h on R (see 284),
R R


h(x)(dx) =
h(y)(y)dy.
(d) If is an indefinite-integral measure over Lebesgue measure, with Radon-Nikodym derivative f , then

is the Fourier transform of f .



proof (a) This is immediate from the definitions of and .
(b) Because
R R R

|h(y)|(dx)dy =
|h(y)|dy < ,
we may change the order of integration to see that
Z Z Z
1
eiyx h(y)(dx)dy

(y)h(y)dy =

2
Z Z Z
1 iyx

= e h(y)dy (dx) = h(x)(dx).
2


(c) This follows immediately from (b), because h is integrable and h = h (284C).
(d) The point is just that
R R
h d = h(x)f (x)dx
for every bounded Borel measurable h : R R (235M), and therefore for the functions x 7 eiyx : R C.
Now
1 R 1 R
(y) =
eiyx (dx) =
eiyx f (x)dx = f (y)
2 2
for every y.
285F Characteristic functions 467

285E Lemma Let X be a normal random variable with expectation a and variance 2 , where > 0.
Then the characteristic function of X is given by
2 2
(y) = eiya e y /2
.

proof This is just 283N with the constants changed. We have

Z
1 2
/2 2
(y) = E(eiyX ) = eiyx e(xa) dx
2
(taking the density function for X given in 274Ad, and applying 271Ic)
Z
1 2
= eiy(t+a) et /2 dt
2
(substituting x = t + a)

= eiya 2 1 (y)
2
(setting 1 (x) = 1 ex /2 , as in 283N)
2
2 2
= eiya e y /2
.

285F I now give results corresponding to parts of 283C, with an extra refinement concerning independent
random variables (285I).
Proposition Let be a Radon probability measure on R r , and its characteristic function.
(a) (0) = 1.
(b) : R r C is uniformly continuous.
(c) (y) = (y),R |(y)| 1 for every y Rr . R
(d) If r = 1 and R |x|(dx) < , then 0 (y) exists and is equal to i Rxeixy (dx) for every y R.
(e) If r = 1 and x2 (dx) < , then 00 (y) exists and is equal to x2 eixy (dx) for every y R.
R
proof (a) (0) = 1(dx) = (R r ) = 1.
(b) Let > 0. Let M > 0 be such that
{x : kxk M } ,

writing kxk = x . x as usual. Let > 0 be such that |eia 1| whenever |a| . Now suppose that y,
y 0 R r are such that ky y 0 k /M . Then whenever kxk M ,
0 0 0 0
|eiy . x eiy .x
| = |eiy .x
||ei(yy ) . x 1| = |ei(yy ) . x 1|
because
|(y y 0 ) . x| ky y 0 kkxk .
Consequently
Z Z
0 iy . x iy 0 . x
|(y) (y )| |e e |(dx) + |eiy . x |(dx)
kxkM kxk>M
Z
0
+ |eiy .x
|(dx)
kxk>M

+ + = 3.
As is arbitrary, is uniformly continuous.
(c) This is elementary;
R R
(y) = eiy . x (dx) = eiy . x (dx) = (y),
R R R
|(y)| = | eiy . x (dx)| |eiy . x |(dx) = 1 (dx) = 1.
468 Fourier analysis 285F

iyx
(d) The point is that | y e | = |x| for every x, y R. So by 123D (applied, strictly speaking, to the
real and imaginary parts of the function)
d R R iyx R
0 (y) = eiyx (dx) = e (dx) = ixeiyx (dx).
dy y


(e) Since we now have | y xeiyx | = x2 for every x, y, we can repeat the argument to get
d R R R
00 (y) = i xeiyx (dx) = i xeiyx (dx) = x2 eiyx (dx).
dy y

285G Corollary (a) Let X be a real-valued random variable with finite expectation, and its charac-
teristic function. Then 0 (0) = iE(X).
(b) Let X be a real-valued random variable with finite variance, and its characteristic function. Then
00 (0) = E(X 2 ).
proof We have only to match X to its distribution , and say that
X has finite expectation
corresponds to
R
|x|(dx) = E(|X|) < ,
so that
R
0 (0) = i x (dx) = iE(X),
and that
X has finite variance
corresponds to
R
x2 (dx) = E(X 2 ) < ,
so that
R
00 (0) = x2 (dx) = E(X 2 ),
as in 271E.


285H Remark Observe that there is no result corresponding to 283Cg (lim|y| f (y) = 0). If is the
Radon probability measure giving mass 1 to the point 0 (that is, the Dirac delta function of 284R, that is,
the distribution of a random variable which is zero almost everywhere), then (y) = 1 for every y.

285I Proposition Let X1 , . . . , Xn be independent real-valued random variables, with characteristic


functions 1 , . . . , n . Let be the characteristic function of their sum X = X1 + . . . + Xn . Then
Qn
(y) = j=1 j (y)
for every y R.
proof Let y R. By 272E, the variables
Yj = eiyXj
are independent, so by 272Q
Qn Qn Qn
(y) = E(eiyX ) = E(eiy(X1 +...+Xn ) ) = E( j=1 Yj ) = j=1 E(Yj ) = j=1 j (y),
as required.
Remark See also 285R below.
285L Characteristic functions 469

285J There is an inversion theorem for characteristic functions, corresponding to 283F; I give it in
285Xh, with an r-dimensional version in 285Yb. However, this does not seem to be as useful as the following
group of results.
Lemma Let be a Radon probability measure on Rr , and its characteristic function. Then for any j r,
a>0
R 1/a
{x : |j | a} 7a 0
(1 Re (tej ))dt,
where ej R r is the jth unit vector.
proof We have

Z 1/a Z 1/a Z

7a (1 Re (tej ))dt = 7a 1 Re eitj (dx) dt
0 0 Rr
Z 1/a Z
= 7a 1 cos(tj )(dx)dt
0 Rr
Z Z 1/a
= 7a 1 cos(tj )dt (dx)
Rr 0
1
(because (x, t) 7 1 cos(tj ) is bounded and R r is finite)
a
Z
1 1 j
= 7a sin (dx)
r a j a
ZR
1 1 j
7a sin (dx)
|j |a a j a
1 1
(because sin for every 6= 0)
a a
{x : |j | a},

because
sin sin 1 6
if 1,
1 7
so
1 1 j 1
a( sin )
a j a 7

if |j | a.

285K Characteristic functions and the vague topology The time has come to return to ideas
mentioned briefly in 274L. Fix r 1 and let P be the set of all Radon probability measures on R r . For any
bounded continuous function h : R r R, define h : P P R by setting
R R
h (, 0 ) = | h d h d 0 |
for , 0 P . Then the vague topology on P is the topology generated by the pseudometrics h (274Ld).

285L Theorem Let , hn inN be Radon probability measures on R r , with characteristic functions ,
hn inN . Then the following are equiveridical:
(i) R = limn n forR the vague topology;
(ii) h d = limn h dn for every bounded continuous h : R r R;
(iii) limn n (y) = (y) for every y R r .
proof (a) The equivalence of (i) and (ii) is virtually the definition of the vague topology; we have
470 Fourier analysis 285L

lim n = for the vague topology


n
lim h (n , ) = 0 for every bounded continuous h
n
(2A3Mc)
Z Z
lim | h dn h d| = 0 for every bounded continuous h.
n

(b) Next, (ii) obviously implies (iii), because


R
Re (y) = hy d = limn hy dn = limn Re n (y),
setting hy (x) = cos x. y for each x, and similarly
Im (y) = limn Im n (y)
for every y R r .
(c) So we are left to prove that (iii)(ii). I start by showing that, given > 0, there is a closed bounded
set K such that
n (R r \ K) for every n N.
PP We know that (0) = 1 and that is continuous at 0 (285Fb). Let a > 0 be so large that for every j r,
|t| 1/a we have

1 Re (tej ) ,
14r
writing ej for the jth unit vector, as in 285J. Then
R 1/a
7a 0
(1 Re (tej ))dt
2r

for each j r. By Lebesgues Dominated Convergence Theorem (since of course the functions t 7 1
Re n (tej ) are uniformly bounded on [0, a1 ]), there is an n0 N such that
R 1/a
7a 0
(1 Re n (tej ))dt
r
for every j r, n n0 . But 285J tells us that now

n {x : |j | a}
r
for every j r, n n0 . On the other hand, there is surely a b a such that

n {x : |j | b}
r
for every j r, n < n0 . So, setting K = {x : |j | b for every j r},
n (R r \ K)
for every n N, as required. Q
Q
(d) Now take any bounded continuous h : R r R and > 0. Set M = 1 + supxR r |h(x)|, and let K be
a bounded closed set such that

n (R r \ K) for every n N, (R r \ K) ,
M M
using (b) just above. By the Stone-Weierstrass theorem (281K) there are y0 , . . . , ym Qr and c0 , . . . , cm C
such that
|h(x) g(x)| for every x K,

|g(x)| M for every x R r ,


285O Characteristic functions 471

Pm iyk . x
writing g(x) = k=0 ck e for x R r . Now
R Pm Pm R
limn g dn = limn k=0 ck n (yk ) = k=0 ck (yk ) = g d.
On the other hand, for every n N,
R R R
| g dn h dn | K
|g h|dn + 2M n (R \ K) 3,
R R
and similarly | g d h d| 3. Consequently
R R
lim supn | h dn h d| 6.
As is arbitrary,
R R
limn h dn = h d,
and (ii) is true.

285M Corollary (a) Let , 0 be two Radon probability measures on R r with the same characteristic
functions. Then they are equal.
(b) Let (X1 , . . . , Xr ) and (Y1 , . . . , Yr ) be two families of real-valued random variables. If
E(ei1 X1 +...+ir Xr ) = E(ei1 Y1 +...+ir Yr )
for all 1 , . . . , r R, then (X1 , . . . , Xr ) has the same joint distribution as (Y1 , . . . , Yr ).
R R
proof (a) Applying 285L with n = 0 for every n, we see that h d 0 = h d for every bounded continuous
h : R r R. By 256D(iv), = 0 .
(b) Apply (a) with , 0 the two joint distributions.

285N Remarks Probably the most important application of this theorem is to the standard proof of
the Central Limit Theorem. I sketch the ideas in 285Xn and 285Yj-285Ym; details may be found in most
serious probability texts; two on my shelf are Shiryayev 84, III.4, and Feller 66, XV.6. However, to
get the full strength of Lindebergs version of the Central Limit Theorem we have to work quite hard, and I
therefore propose to illustrate the method with a version of Poissons theorem (285Q) instead. I begin with
two lemmas which are very frequently used in results of this kind.

285O Lemma Let c0 , . . . , cn , d0 , . . . , dn be complex numbers of modulus at most 1. Then


Qn Qn Pn
| k=0 ck k=0 dk | k=0 |ck dk |.

proof Induce on n. The case n = 0 is trivial. For the case n = 1 we have

|c0 c1 d0 d1 | = |c0 (c1 d1 ) + (c0 d0 )d1 |


|c0 ||c1 d1 | + |c0 d0 ||d1 | |c1 d1 | + |c0 d0 |,
which is what we need. For the inductive step to n + 1, we have

n+1
Y n+1
Y n
Y n
Y
| ck dk | | ck dk | + |cn+1 dn+1 |
k=0 k=0 k=0
k=0
Qn Qn
(by the case just done, because cn+1 , dn+1 , k=0 ck and k=0 dk all have modulus at most 1)
Xn
|ck dk | + |cn+1 dn+1 |
k=0
(by the inductive hypothesis)
n+1
X
= |ck dk |,
k=0

so the induction continues.


472 Fourier analysis 285P

285P Lemma Let M , > 0. Then there are > 0 and y0 , . . . , yn R such that whenever X, Z are
two real-valued random variables with E(|X|) M , E(|Z|) M and |X (yj ) Z (yj )| for every j n,
then FX (a) FZ (a + ) + for every a R, where I write X for the characteristic function of X and FX
for the distribution function of X.

proof Set = > 0, b = M/.
7

(a) Define h0 : R [0, 1] by setting


h0 (x) = 1 if x 0, h0 (x) = 1 x/ if 0 x , h0 (x) = 0 if x .
Then h0 is continuous. Let m be the integral part of b/, and for m k m + 1 set hk (x) = h0 (x k).
By the
PStone-Weierstrass theorem (281K), there are y0 , . . . , yn R and c0 , . . . , cn C such that, writing
n
g0 (x) = j=0 cj eiyj x ,
|h0 (x) g0 (x)| for every x [b (m + 1), b + m],

|g0 (x)| 1 for every x R.


For m k m + 1, set
Pn iyj k iyj x
gk (x) = g0 (x k) = j=0 cj e e .
Pn
Set = /(1 + j=0 |cj |) > 0.
(b) Now suppose that X, Z are random variables such that E(|X|) M , E(|Z|) M and |X (yj )
Z (yj )| for every j n. Then for any k we have
Pn Pn
E(gk (X)) = E( j=0 cj eiyj k eiyj X ) = j=0 cj eiyj k X (yj ),
and similarly
Pn iyj k
E(gk (Z)) = j=0 cj e Z (yj ),
so
Pn Pn
|E(gk (X)) E(gk (Z))| j=0 |cj ||X (yj ) Z (yj )| j=0 |cj | .
Next,
|hk (x) gk (x)| for every x [b (m + 1) + k, b + m + k] [b, b],

|hk (x) gk (x)| 2 for every x,

M
Pr(|X| b) = ,
b

so E(|hk (X) gk (X)|) 3; and similarly E(|hk (Z) gk (Z)|) 3. Putting these together,
|E(hk (X)) E(hk (Z))| 7 =
whenever m k m + 1.
(c) Now suppose that b a b. Then there is a k such that m k m + 1 and a k a + .
Since
], a] ], k] hk ], (k + 1)] ], a + 2],
we must have
Pr(X a) E(hk (X)),

E(hk (Z)) Pr(Z a + 2) Pr(Z a + ).


But this means that
Pr(X a) E(hk (X)) E(hk (Z)) + Pr(Z a + ) +
whenever a [b, b].
285Q Characteristic functions 473

(d) As for the cases a b, a b, we surely have


b(1 FZ (b)) = b Pr(Z b) E(|Z|) M ,
so if a b then
M
FX (a) 1 FZ (a) + 1 FZ (b) FZ (a) + = FZ (a) + FZ (a + ) + .
b
Similarly,
bFX (b) E(|X|) M ,
so
FX (a) FZ (a + ) +
for every a b. This completes the proof.

285Q Law of Rare Events: Theorem For any M 0, > 0 there is a > 0 such that whenever
X
P0n . . . , Xn are independent {0, 1}-valued random variables with Pr(Xk = 1) = pk for every k n, and
,
k=0 pk = M , and X = X0 + . . . + Xn , then
m
| Pr(X = m) e |
m!
for every m N.
proof (a) We should begin by calculating some characteristic functions. First, the characteristic function
k of Xk will be given by
k (y) = (1 pk )eiy0 + pk eiy1 = 1 + pk (eiy 1).
= m e /m! for every
Next, if Z is a Poisson random variable with parameter (that is, if Pr(Z = m) P

m N; all you need to know at this point about the Poisson distribution is that m=0 m e /m! = 1),
then its characteristic function Z is given by
P m iym P (eiy )m iy iy
Z (y) = m=0 e e = e m=0 = e ee = e(e 1) .
m! m!

(b) Before getting down to s and s, I show how to estimate X (y) Z (y). We know that
Qn
X (y) = k=0 k (y)
(using 285I), while
Qn iy
Z (y) = k=0 epk (e 1)
.
iy
Because k (y), epk (e 1)
all have modulus at most 1 (we have
iy
|epk (e 1)
| = epk (1cos y) 1,)
285O tells us that
Pn iy Pn iy
|X (y) Z (y)| k=0 |k (y) epk (e 1)
|= k=0 |epk (e 1)
1 pk (eiy 1)|.

(c) So we have a little bit of analysis to do. To estimate |ez 1 z| where Re z 0, consider the function
g(t) = Re(c(etz 1 tz))
where |c| = 1. We have g(0) = g 0 (0) = 0 and
|g 00 (t)| = | Re(c(z 2 etz ))| |c||z 2 ||etz | |z|2
for every t 0, so that
1
|g(1)| |z|2
2

by the (real-valued) Taylor theorem with remainder, or otherwise. As c is arbitrary,


474 Fourier analysis 285Q

1
|ez 1 z| |z|2
2
whenever Re z 0. In particular,
iy 1
|epk (e 1)
1 pk (eiy 1)| p2k |eiy 1|2 2p2k
2
for each k, and
Pn iy Pn
|X (y) Z (y)| k=0 |epk (e 1)
1 pk (eiy 1)| 2 k=0 p2k
for each y R.
(d) Now for the detailed estimates. Given M 0 and > 0, let > 0 and y0 , . . . , yl R be such that
1
Pr(X a) Pr(Z a + ) +
2 2

whenever X, Z are real-valued random variables, E(|X|) M , E(|Z|) M and |X (yj ) X (yj )|
for every j l (285P). Take = /(2M + 1) and suppose that X0P , . . . , Xn are independent {0, 1}-valued
n
random variables with Pr(Xk = 1) = pk for every k n, = k=0 pk M . Set X = X0 + . . . + Xn
and let Z be a Poisson random variable with parameter ; then by the arguments of (a)-(c),
Pn Pn
|X (y) Z (y)| 2 k=0 p2k 2 k=0 pk = 2
for every y R. Also
Pn
E(|X|) = E(X) = k=0 pk = M ,
P m P m P m+1
E(|Z|) = E(Z) = m=0 m e = e m=1 (m1)! = e m=0 = M.
m! m!
So
1
Pr(X a) Pr(Z a + ) + ,
2 2

1
Pr(Z a) Pr(X a + ) +
2 2
for every a. But as both X and Z take all their values in N,

| Pr(X m) Pr(Z m)|
2
for every m N, and
m
| Pr(X = m) e | = | Pr(X = m) Pr(Z = m)|
m!
for every m N, as required.

285R Convolutions Recall from 257A that if , are Radon probability measures on R r then they
have a convolution defined by writing
( )(E) = ( ){(x, y) : x + y E}
r
for every Borel set E R , which is also a Radon probability measure. We can readily compute the
characteristic function from 257B: we have
Z Z
0
(y) = eiy . x ( )(dx) = eiy . (x+x ) (dx)(dx0 )
Z Z Z
0 0
= eiy . x eiy . x (dx)(dx0 ) = eiy . x (dx) eiy . x (dx0 ) = (y) (y).

(Thus convolution of measures corresponds to pointwise multiplication of characteristic functions, just as


convolution of functions corresponds to pointwise multiplication of Fourier transforms.) Recalling that the
sum of independent random variables corresponds to convolution of their distributions (272S), this gives
285Xb Characteristic functions 475

another way of looking at 285I. Remember also that if , have Radon-Nikodym derivatives f , f with
respect to Lebesgue measure then f f is a Radon-Nikodym derivative of (257F).

285S The vague topology and pointwise convergence of characteristic functions In 285L we
saw that a sequence hn inN of Radon probability measures on R r converges in the vague topology to a
Radon probability measure if and only if
R R
limn eiy . x n (dx) = eiy . x (dx)
for every y R r ; that is, iff
limn 0y (n , ) = 0 for every y Rr ,
writing
R R
0y (, 0 ) = | eiy . x (dx) eiy . x 0 (dx)|
for Radon probability measures , 0 on R r and y R r . It is natural to ask whether the pseudometrics
0y actually define the vague topology. Writing T for the vague topology, S for the topology defined by
{0y : y R r }, we surely have S T, just because every 0y is one of the pseudometrics used in the definition
of T. Also we know that S and T give the same convergent sequences, and incidentally that T is metrizable
(see 285Xq). But all this does not quite amount to saying that the two topologies are the same, and indeed
they are not, as the next result shows.

285T Proposition Let y0 , . . . , yn R and > 0. Then there are infinitely many m N such that
|1 eiyk m | for every k n.
proof Let 1 , . . . , r R be such that 1 = 0 , 1 , . . . ,P
r are linearly independent over Q and every yk /2
r
is a linear combination of the j over Q; say yk = 2 j=0 qkj j where every qkj Q. Express the qkj as
Pr
pkj /p where each pkj Z and p N \ {0}. Set M = maxkn j=0 |pkj |.
Take any m0 N and let > 0 be such that |1 e2ix | whenever |x| 2M . By Weyls
Equidistribution Theorem (281N), there are infinitely many m such that <mj > whenever 1 j r;
in particular, there is such an m m0 . Let mj be the integral part of mj , so that |mj mj | for
0 j r. Then
Pr Pr
|mpyk 2 j=0 pkj mj | 2 j=0 |pkj ||mj mj | 2M ,
so that
Pr
|1 eiyk mp | = |1 exp(i(mpyk 2 j=0 pkj mj ))|
for every k n. As mp m0 and m0 is arbitrary, this proves the result.

285U Corollary The topologies S and T on the space of Radon probability measures on R, as described
in 285S, are different.
proof Let x be the Radon probability measure on R which gives mass 1 to the singleton x, so that x (E) = 1
if x E R. By 285T, every member of S which contains 0 also contains m for infinitely many m N.
On the other hand, the set
R 2 1
G = { : ex (dx) > }
2
is a member of T, containing 0 , which does not contain m for any integer m 6= 0. So G T \ S and T 6= S.

285X
R Basic exercises (a) Let be a Radon probability measure on R r , where r 1, and suppose
that kxk(dx) < . Show that the characteristic function of is differentiable (in the full sense of

R
262Fa) and that j
(y) = i j eiy . x (dx) for every j r, y R r , using j , j to represent the coordinates
of x and y as usual.

>(b) Let X = (X1 , . . . , Xr ) be a family of real-valued random variables, with characteristic function X .
Show that the characteristic function Xj of Xj is given by
476 Fourier analysis 285Xb

Xj (y) = X (yej ) for every y R,


r
where ej is the jth unit vector of R .

> (c) Let X be a real-valued random variable and X its characteristic function. Show that
aX+b (y) = eiyb X (ay)
for any a, b, y R.

(d) Let X be a real-valued random variable and its characteristic function.


(i) Show that for any integrable complex-valued function h on R,

1 R
E(h(X)) =
(y)h(y)dy,
2

writing h for the Fourier transform of h.
(ii) Show that for any rapidly decreasing test function h,
1 R
E(h(X)) =
(y)h(y)dy.
2

(e) Let be a Radon probability measure on R, and suppose that its characteristic function is square-
integrable. Show that is an indefinite-integral measure over Lebesgue measure and that its Radon-Nikodym
R
derivatives are also square-integrable. (Hint: use 284O to find a square-integrable f such that h f =
R
1 h for every rapidly decreasing test function h, and ideas from the proof of 284G to show that
2
Rb
a
f = ]a, b[ whenever a < b in R.)

> (f ) Let X = (X1 , . . . , Xr ) be a family of real-valued random variables with characteristic function X .
Suppose that X is expressible in the form
Qr
X (y) = j=1 j (j )
for some functions 1 , . . . , r , writing y = (1 , . . . , r ) as usual. Show that X1 , . . . , Xr are independent.
(Hint: show that the j must be the characteristic functions of the Xj ; now show that the distribution of
X has the same characteristic function as the product of the distributions of the Xj .)

(g) Let X1 , X2 be independent real-valued random variables with the same distribution, and the
characteristic function of X1 X2 . Show that (t) = (t) 0 for every t R.

(h) Let be a Radon probability measure on R, with characteristic function . Show that
1 i Ra eidy eicy
([c, d] + ]c, d[) = lima a
(y)dy
2 2 y

whenever c d in R. (Hint: use part (a) of the proof of 283F.)

(i) Let X be a real-valued random variable and X its characteristic function. Show that
R 1/a
Pr(|X| a) 7a 0
(1 Re(X (y))dy
for every a > 0.

(j) We say that a set Q of Radon probability measures on R is uniformly tight if for every > 0 there
is an M 0 such that (R \ [M, M ]) for every Q. Show that if Q is any uniformly tight family of
Radon probability measures on R, and > 0, then there are > 0 and y0 , . . . , yn R such that
], a] 0 ], a + ] +
whenever , 0 Q and | (yj ) 0 (yj )| for every j n, writing for the characteristic function of
.
285Xs Characteristic functions 477

(k) Let hn inN be a sequence of Radon probability measures on R, and suppose that it converges for the
vague topology to a Radon probability measure . Show that {} {n : n N} is uniformly tight in the
sense of 285Xj.

> (l) Let , 0 be two totally finite Radon measures on R r which agree on all closed half-spaces, that is, sets
of the form {x : x .y c} for c R, y R r . Show that = 0 . (Hint: reduce to the case R r = 0 Rr = 1
and use 285M.)

> (m) For > 0, the Cauchy distribution with centre 0 and scale parameter is the Radon probability
measure defined by the formula
R 1
(E) = E 2 +t2
dt.

(i) Show that if X is a random variable with distribution then Pr(X 0) = Pr(|X| ) = 12 . (ii)
Show that the characteristic function of is y 7 e|y| . (Hint: 283Xr.) (iii) Show that if X and Y are
independent random variables with Cauchy distributions, both centred at 0 and with scale parameters ,
respectively, and , are not both 0, then X + Y has a Cauchy distribution centred at 0 and with scale
parameter || + ||. (iv) Show that if X and Y are independent normally distributed random variables
with expectation 0 then X/Y has a Cauchy distribution.

> (n) Let X1 , X2 , . . . be an independent identically distributed sequence of random variables, all of zero
expectation and variance 1; let be their common characteristic function. For each n 1, set Sn =
1 (X1 + . . . + Xn ).
n
y
(i) Show that the characteristic function n of Sn is given by the formula n (y) = (( ))n for each
n
n.
2 y 2
(ii) Show that |n (y) ey /2
| n|( ) ey /2n
|.
n
2
(iii) Setting h(y) = (y) ey /2 , show that h(0) = h0 (0) = h00 (0) = 0 and therefore that
2
limn nh(y/ n) = 0, so that limn n (y) = ey /2 for every y R.
1 Ra 2
(iv) Show that limn Pr(Sn a) =
ex /2 dx for every a R.
2

> (o) A random variable X has a Poisson distribution with parameter > 0 if Pr(X = n) = e n /n!
for every n N. (i) Show that in this case E(X) = Var(X) = . (ii) Show that if X and Y are independent
random variables with Poisson distributions then X + Y has a Poisson distribution. (iii) Find a proof of (ii)
based on 285Q.

> (p) For x R r , let x be the Radon probability measure on R r which gives mass 1 to the singleton {x},
so that x (E) = 1 whenever x E R r . Show that x y = x+y for all x, y R r .

(q) Let P be the set of Radon probability measures on R r . For y R r , set 0y (, 0 ) = | (y) 0 (y)|
1
for all , 0 P , writing for the characteristic function of . Set (x) = (2)r
ex . x/2 for x R r . Show
that the vague topology on P is defined by the family { } {0y : y Qr }, defining as in 285K, and is
therefore metrizable. (Hint: 281K; cf. 285Xj.)

> (r) Let : R rP CP be the characteristic function of a Radon probability measure on R r . Show that
n n
(0) = 1 and that j=0 k=0 cj ck (aj ak ) 0 whenever a0 , . . . , an R r and c0 , . . . , cn C. (Bochners
theorem states that these conditions are sufficient, as well as necessary, for to be a characteristic function;
see 445N in Volume 4.)
Pn
(s) Let hXn inN be an independent sequence of real-valued random variables and set Sn = j=0 Xj for
each n N. Suppose that the sequence hSn inN of distributions is convergent for the vague topology to a
distribution. Show that hSn inN converges in measure, therefore a.e. (Hint: 285J, 273B.)
478 Fourier analysis 285Y

285Y Further exercises (a) Let be a Radon probability measure on R r . Write


1 R
(y) = eiy . x (dx)
( 2) r

for every y R r .
1
(i) Writing for the characteristic function of , show that (y) = r (y) for every y R r .
( 2)
R R
(ii) Show that (y)h(y)dy = h(x)(dx) for any Lebesgue integrable complex-valued function h on

R r , defining the Fourier transform h as in 283Wa.
R R
(iii) Show that h(x)(dx) = h(y)(y)dy for any rapidly decreasing test function h on R r .
(iv) Show that if is an indefinite-integral measure over Lebesgue measure, with Radon-Nikodym

derivative f , then is the Fourier transform of f .

(b) Let be a Radon probability measure on R r , with characteristic function . Show that whenever
c d in R r then
Z Z r
i r 1 r Y eij j eij j
lim ... (y)dy
2 1 ,... ,r 1 r j=1
j

Q
exists and lies between ]c, d[ and [c, d], writing ]c, d[ = jr ]j , j [ if c = (1 , . . . , r ) and d = (1 , . . . , r ).

(c) Let hXn inN be anPindependent identically distributed sequence of (not-essentially-constant) random
n
variables, and set Sn = k=0 X2k+1 X2k for each n N. Show that limn Pr(|S Pnn| ) = 1 for every
R. (Hint: 285Xg, proof of 285J.) Hence, or otherwise, show that limn Pr(| k=0 Xk | ) = 1 for
every R.

(d) For Radon probability measures , 0 on R r set

(, 0 ) = inf{ : 0, ], a] 0 ], a + 1] + ], a + 21] + 2
for every a R r },

writing ], a] = {(1 , . . . , r ) : j j for every j r} when a = (1 , . . . , r ), and 1 = (1, . . . , 1) R r .


Show that is a metric on the set of Radon probability measures on R r , and that the topology it defines is
the vague topology. (Cf. 274Ya.)

(e) Let r 1. We say that a set Q of Radon probability measures on R r is uniformly tight if for every
> 0 there is a compact set K R r such that (R r \K) for every Q. Show that if Q is any uniformly
tight family of Radon probability measures on R r , and > 0, then there are > 0, y0 , . . . , yn R r such
that ], a] 0 ], a + 1] + whenever , 0 Q and a R r and | (yj ) 0 (yj )| for every
j n, writing for the characteristic function of .
R
(f ) Show that for any M 0 the set of Radon probability measures on R r such that kxk(dx) M
is uniformly tight in the sense of 285Ye.

(g) Let Cb (R r ) be the Banach space of bounded continuous real-valued functions on R r .


(i) Show that any Radon probability
R measure on Rr corresponds to a continuous linear functional
h : Cb (R ) R, writing h (f ) = f d for f Cb (Rr ).
r

(ii) Show that if h = h 0 then = 0 .


(iii) Show that the vague topology on the set of Radon probability measures corresponds to the weak*
topology on the dual (Cb (R r )) of Cb (R r ) (2A5Ig).

(h) Let r 1 and let P be the set of Radon probability measures on R r . For m N let m be the
pseudometric on P defined by setting m (, 0 ) = supkykm | (y) 0 (y)| for , 0 P , writing for the
characteristic function of . Show that {m : m N} defines the vague topology on P .
285 Notes Characteristic functions 479

(i) Let r 1 and let P be the set of Radon probability measures on R r . For m N let m be the
pseudometric on P defined by setting
R
m (, 0 ) = {y:kykm}
| (y) 0 (y)|dy

for , 0 P , writing for the characteristic function of . Show that {m : m N} defines the vague
topology on P .

(j) Let X be a real-valued random variable with finite variance. Show that for any 0,
1 1
|(y) 1 iyE(X) + y 2 E(X 2 )| |y 3 |E(X 2 ) + y 2 E( (X)),
2 6

writing for the characteristic function of X and (x) = 0 for |x| , x2 for |x| > .

(k) Suppose that > 0 and that X0 , . . . , Xn are independent real-valued random variables such that
Pn Pn
E(Xk ) = 0 for every k n, k=0 Var(Xk ) = 1, k=0 E( (Xk ))
2

2
(writing (x) = 0 if |x| , x if |x| > ). Set = / + , and let Z be a standard normal random
variable. Show that
2 1
|(y) ey /2
| |y|3 + y 2 ( + E( (Z)))
3
Pn
for every y R, writing for the characteristic function of X = k=0 Xk . (Hint: p
write k for the
characteristic function of Xk and k for the characteristic function of k Z, where k = Var(Xk ). Show
that
1
|k (y) k (y)| |y 3 |k2 + y 2 E( (Xk )) + k2 E( (Z)) .)
3

(l) Show that for every > 0 there is a > 0 such that whenever X0 , . . . , Xn are independent real-valued
random variables such that
Pn Pn
E(Xk ) = 0 for every k n, k=0 Var(Xk ) = 1, k=0 E( (Xk ))
2
(writing (x) = 0 if |x| , x2 if |x| > ), then |(y) ey /2
| (y 2 + |y 3 |) for every y R, writing for
the characteristic function of X = X0 + . . . + Xn .

(m) Use 285Yl to prove Lindebergs theorem (274F).

(n) Let r 1 and let P be the set of Radon probability measures on R r . Show that convolution, regarded
as a map from P P to P , is continuous when P is given the vague topology. (Hint: 281Xa and 257B will
help.)
0
(o) Let S be the topology on R defined by {0y : y R}, where 0y (x, x0 ) = |eiyx eiyx | (compare 285S).
Show that addition and subtraction are continuous for S in the sense of 2A5A.

285 Notes and comments Just as with Fourier transforms, the power of methods which use the charac-
teristic functions of distributions is based on three points: (i) the characteristic function of a distribution
determines the distribution (285M); (ii) the properties of interest in a distribution are reflected in accessible
properties of its characteristic function (285G, 285I, 285J) (iii) these properties of the characteristic function
are actually different from the corresponding properties of the distribution, and are amenable to different
kinds of investigation. Above all, the fact that (for sequences!) convergence in the vague topology of dis-
tributions corresponds to pointwise convergence for characteristic functions (285L) provides us with a path
to the classic limit theorems, as in 285Q and 285Xn. In 285S-285U I show that this result for sequences
does not correspond immediately to any alternative characterization of the vague topology, though it can
be adapted in more than one way to give such a characterization (see 285Yh-285Yi).
Concerning the Central Limit Theorem there is one conspicuous difference between the method suggested
here and that of 274. The previous approach offered at least a theoretical possibility of giving an explicit
480 Fourier analysis 285 Notes

formula for in 274F as a function of , and hence an estimate of the rate of convergence to be expected
in the Central Limit Theorem. The arguments in the present chapter, involving as they do an entirely
non-constructive compactness argument in 281A, leave us with no way of achieving such an estimate. But
in fact the method of characteristic functions, suitably refined, is the basis of the best estimates known, such
as the Berry-Esseen theorem (274Hc).
In 285D I try to show how the characteristic function of a Radon probability measure can be related

to a Fourier transform of which corresponds directly to the Fourier transforms of functions discussed
in 283-284. If f is a non-negative Lebesgue integrable function and we take to be the corresponding


indefinite-integral measure, then = f . Thus the concept of Fourier transform of a measure is a natural
extension of the Fourier transform of an integrable function. Looking at it from the other side, the formula

of 285Dc shows that can be thought of as representing the inverse Fourier transform of in the sense
of 284H-284I. Taking to be the measure which assigns a mass 1 to the point 0, we get the Dirac delta
function, with Fourier transform the constant function 1. These ideas can be extended without difficulty to
handle convolutions of measures (285R).
It is a striking fact that while there is no satisfactory characterization of the functions which are Fourier
transforms of integrable functions, there is a characterization of the characteristic functions of probability
distributions. This is Bochners theorem. I give the condition in 285Xr, asking you to prove its necessity
as an exercise; we already have three-quarters of the machinery to prove its sufficiency, but the last step will
have to wait for Volume 4.

286 Carlesons theorem


Carlesons theorem (Carleson 66) was the (unexpected) solution to a long-standing problem. Remark-
ably, it can be proved by elementary arguments. The hardest part of the work below, in 286I-286L, involves
only the laborious verification of inequalities. How the inequalities were chosen is a different matter; for
once, some of the ideas of the proof lie in the statements of the lemmas. The argument here is a greatly
expanded version of Lacey & Thiele 00.
The Hardy-Littlewood Maximal Theorem (286A) is important, and worth learning even if you leave the
rest of the section as an unexamined monument. I bring 286B-286D forward to the beginning of the section,
even though they are little more than worked exercises, because they also have potential uses in other
contexts.
The complexity of the argument is such that it is useful to introduce a substantial number of special
notations. Rather than include these in the general index, I give a list in 286W. Among them are ten
constants C1 , . . . , C10 . The values of these numbers are of no significance. The method of proof here is quite
inappropriate if we want to estimate rates of convergence. I give recipes for the calculation of the Cn only
for the sake of the linear logic in which this treatise is written, and because they occasionally offer clues
concerning the tactics being used.
In this section all integrals are with respect to Lebesgue measure on R unless otherwise stated.

286A The Maximal Theorem Suppose that 1 < p < and that f LpC () (definition: 244O). Set
1 Rb
f (x) = sup{ a
|f | : a x b, a < b}
ba
1/p
2 p
for x R. Then kf kp kf kp .
p1

proof (a) It is enough to consider the case f 0. Note that if E R has finite measure, then
R R
E
f = (f E) E kf Ekp (E)1/q kf kp (E)1/q
p R
is finite, where q = , by Holders inequality (244Eb). Consequently, if t > 0 and E f tE, we must
p1
have tE kf Ekp (E)1/q and
286A Carlesons theorem 481

1 1 R
E = (E)pp/q kf Ekpp = E
f p.
tp tp

(b) For t > 0, set


Ra
Gt = {x : x
f > (a x)t for some a > x}.

(i) Gt is an open set. P


P For any a R,
Ra
Gta = {x : x < a, x
f > (a x)t}
Ra S
is open, because x 7 x
f and x 7 (a x)t are continuous (225A); so Gt = aR Gta is open. Q
Q
R
(ii) By 2A2I, there is a partition C of Gt into open intervals. Now C is bounded and tC C f for
every C C. P P Express C as ]a, b[ (for the moment, we have to allow for the possibility that one or both
of a, b is infinite). Rc
() If x C, there is some (finite) c > x such that x f > (c x)t. Set d = min(b, c) > x. If d = c, then
Rd Rc
of course x f > (d x)t. If d = b < c, then (because b / Gt ) b f (c b)t, so again
Rd Rb Rc Rc
x
f = x f = x f b f > (c x)t (c b)t = (b x)t = (d x)t.
Rd
Thus we always have some d ]x, b] such that x f > (d x)t.
() Now take any z C, and consider
Rx
Az = {x : z x b, z f (x z)t}.
Rx
Then z Az , and Az is closed, again because the functions x 7 z f and x 7 (x z)t are continuous.
1
Moreover, Az is bounded, because x z kf kpp for every x Az , by (a). ?? If sup Az = x0 < b, then
tp
Rd
x0 Az , and there is a d ]x0 , b] such thatf t(d x0 ), by (); but in this case d Az , which is
x0
Rb
impossible. X
X Thus b = sup Az Az (in particular, b < ), and z f (b z)t.
1
() Letting z decrease to a, we see that b a kf kpp , so a is finite, and also
tp
Rb Rb
a
f = limza z
f limza (b z)t = (b a)t,
as required. Q
Q
(iii) Accordingly, because C is countable and f is non-negative,
P P 1 R 1 R
Gt = CC C CC p C f p p f p
t t
is finite, and
R P R P
Gt
f= CC C
f CC tC = tGt .

(c) All this is true for every t > 0. Now if we set


1 Ra
f1 (x) = supa>x f
ax x

for x R, we have {x : f1 (x) > t} = Gt for every t > 0.


For any t > 0,
1 1 R 1 R 1
tGt = (1 )tGt Gt
f t1
(f t1)+ ,
p q q q
writing 1 for the function with constant value 1. So

Z Z
(f1 )p = {x : f1 (x)p > t}dt
0
(see 252O)
482 Fourier analysis 286A
Z
=p up1 {x : f1 (x) > u}du
0
(substituting t = up )
Z Z Z
1
p up1 Gu du = p2 up2 (f u1)+ du
0 0
q
Z Z
1
= p2 max(0, f (x) u)up2 dudx
0
q

1
(by Fubinis theorem, 252B, because (x, u) 7 up2 max(0, f (x) u) is measurable and non-negative)
q
Z Z qf (x)
1
= p2 up2 (f (x) u)dudx
0
q
Z
2 p1
p q p p
= f (x)p dx = ( ) kf kpp .
p(p1)
p1

1 Rx R p p
(d) Similarly, setting f2 (x) = supa<x f for x R, (f2 )p ( ) kf kpp . But f = max(f1 , f2 ).
xa a p1
P Of course f1 f and f2 f . But also, if f (x) > t, there must be a non-trivial interval I containing x
P
R Rx Rb
such that I f > tI; if a = inf I and b = sup I, then either a f > (x a)t and f2 (x) > t, or x f > (b x)t
and f1 (x) > t. As x and t are arbitrary, f = max(f1 , f2 ). QQ
Accordingly
Z Z
p p
kf kp = (f ) = max((f1 )p , (f2 )p )

Z
p p
(f1 )p + (f2 )p 2( ) kf kpp .

p1

Taking pth roots, we have the inequality we seek.

286B Lemma Let g : R [0, [ be a function which is non-decreasing on ], ], non-increasing


R on
[, [ and constant on [, ], where . Then for any measurable function f : R [0, ], f g
R 1 Rb

g supa,b,a<b a
f.
ba

1 Rb
proof Set = supa,b,a<b For n, k N set Enk = {x : 2n x +2n , g(x) 2n (k+1)},
f.
ba a R
so that Enk is either empty or a bounded interval including [, ], and Enk f Enk . For n N, set
P4n 1
gn = 2n k=0 Enk ; then hgn inN is a non-decreasing sequence of functions with supremum g, and
Z Z
n
4X 1 Z
n
f g = sup f gn = sup 2 f
nN nN Enk
k=0
n
4X 1 Z Z
n
sup 2 Enk = sup gn = g,
nN nN
k=0

as claimed.

286C Shift, modulation and dilation Some of the calculations below will be easier if we use the
following formalism. For any function f with domain included in R, and R, we can define
(S f )(x) = f (x + ), (M f )(x) = eix f (x), (D f )(x) = f (x)
whenever the right-hand sides are defined. In the case of S f and D f it is sometimes convenient to allow
as a value of the function. We have the following elementary facts.
286E Carlesons theorem 483

(a) S S f = f , D1/ D f = f if 6= 0.
(b) S (f g) = S f S g, D (f g) = D f D g.
(c) D |f | = |D f |.
(d) If f is integrable, then

(M f ) = S f , (S f ) = M f , (S f ) = M f ;
if moreover > 0, then

(D f ) = D1/ f , (D f ) = D1/ f
(283Cc-283Ce).
(e) If f belongs to L1C = L1C (), so do S f , M f and (if 6= 0) D f , and in this case
1
kS f k1 = kM f k1 = kf k1 , kD f k1 = kf k1 .
||

(f ) If f belongs to L2C so do S f , M f and (if 6= 0) D f , and in this case


1
kS f k2 = kM f k2 = kf k2 , kD f k2 = kf k2 .
||

(g) If f is a rapidly decreasing test function (284A), so are M f and S f and (if 6= 0) D f .

286D Lemma Suppose that f : R [0, ] is a measurable function such that, for some constant
R R 1
C 0, E f C E whenever E < . Then f is finite almost everywhere and f (x)dx is finite.
1+|x|
proof For any n 1, set En = {x : |x| n, f (x) n}; then
R
nEn En f C En ,
C2
so En and
n2
T S
{x : f (x) = } = n1 mn Em
P
has measure at most inf n1 m=nR Em = 0.
x
As for the integral, set F (x) = 0 f for x 0. Then, for any a 0,
Z a Z a
f (x) F (a) F (x)
dx = + dx
0
1+x 1+a 0
(1+x)2
(225F)
Z Z
a a
x
x
C + dx C 1+ dx ,
1+a 0
(1+x)2
0
(1+x)2

so
Z Z
f (x) x
dx C 1+ dx
0
1+x 0
(1+x) 2

R0 f (x)
is finite. Similarly, 1x
dx is finite, so we have the result.

k286E The Lacey-Thiele construction (a) Let I be the family of all dyadic intervals of the form
2 n, 2k (n + 1) where k, n Z. The essential geometric property of I is that if I, J I then either I J
or J I or I J = . Let Q be the set of all pairs = (I , J ) I 2 such that I J = 1. For Q,
let k Z be such that I = 2k and J = 2k ; let x be the midpoint of I , y the midpoint of J ,
Jl I the left-hand half-interval of J , Jr I the right-hand half-interval of J , and yl the lower quartile
of J , that is, the midpoint of Jl .
484 Fourier analysis 286Eb


(b) There is a rapidly decreasing test function such that is real-valued and [ 16 , 16 ] [ 15 , 51 ].
PP Look at parts (b)-(d) of the proof of 284G. The process there can be used to provide us with a smooth
function 1 which is zero outside the interval [ 16 , 51 ] and strictly positive on 61 , 15 ; multiplying by a suitable
R Rx
factor, we can arrange that 1 = 1. So if we set 2 (x) = 1 1 for x R, 2 will be smooth, and

, 61 2 , 15 . Now set 0 (x) = 2 (x)2 (x) for x R, and = 0 ; = 0 (284C) will
have the required property. QQ
For Q, set = 2k /2 Myl Sx D2k , so that
l
(x) = 2k /2 eiy x (2k (x x )).

Observe that is a rapidly decreasing test function. Now = 2k /2 Syl Mx D2k , that is,
l
(y) = 2k /2 eix (yy ) (2k (y yl )),
which is zero unless |y yl | 51 2k ; since the length of Jl is 12 2k , this can be true only when y Jl . We
have the following simple facts.
(i) k k2 = 2k /2 2k /2 kk2 =kk2 for every Q.

(ii) k k1 = 2k /2 2k kk1 =2k /2 kk1 for every Q.
(iii) If , Q and J 6= J and Jr Jr is non-empty, then Jl Jl = so

( | ) = ( | ) = 0,
R
by 284Ob. (For f , g L2C , I write (f |g) for f g.)

1
(c) Set w(x) = for x R. Then there is a C1 > 0 such that |(x)| C1 min(w(3), w(x)2 ) for
(1+|x|)3
every x R (because limx x6 (x) = limx x6 (x) = 0). For Q, set w = 2k Sx D2k w, so that
w (x) = 2k w(2k (x x )) for every x. Elementary calculations show that
(i) w depends only on I ;
R R
(ii) w = w = 1 for every ;
(iii)
| (x)| C1 min(2k /2 w (x), 23k /2 w (x)2 )
for every x and (because |(x)| C1 w(x)2 C1 w(x) for every x R).

286F Two partial orders (a) For , Q say that if I I and J J . Then is a
partial order on Q. We have the following elementary facts.
(i) If , then k k .
(ii) If and are incomparable (that is, 6 and 6 ), then (I J ) (I J ) is empty. PP We
may suppose that k k . If J J 6= , then J J , because both are dyadic intervals, and J is the
shorter; but as 6 , this means that I 6 I and I I = . Q Q
(iii) If , 0 are incomparable and both less than or equal to , then I I0 = , because J J J0 .
(iv) If and k k k , then there is a (unique) 0 such that 0 and k0 = k. (The
point is that there is a unique I I such that I I I and I = 2k ; and similarly there is just one
candidate for J0 .)

(b) For , Q say that r if I I and Jr Jr (that is, either = or J Jr ), so that, in


particular, . Note that if , 0 r and k 6= k0 then Jr Jr0 6= , so ( |0 ) = 0 (286E(b-iii)).

(c) It will be convenient to have a shorthand for the following: if P , R Q, say that P 4 R if for every
P there is a R such that .
286G Carlesons theorem 485

286G We shall need the results of some elementary calculations. The first three are nearly trivial.
P 1
Lemma (a) For any m N, n=m w(n + 12 ) .
2(1+m)2 R
(b) Suppose that P and that I is an interval not containing x in its interior. Then I w w (x)I,
where x is the midpointPof I.
(c) For any x R, n= w(x n) 2.
R
(d) There is a constant C2 0 such that w(x)w(x + )dx C2 w() whenever 0 1 and
R. R
(e) There is a constant C3 0 such that |( | )| 2k /2 2k /2 C3 I w (x)dx whenever , Q and
k k .
(f) There is a constant C4 0 such that whenever Q and k Z, then
P R
Q,,k =k R\I w C4 .

proof (a) The point is just that w is convex on ], 0] and [0, [. So we can apply 233Ib with f (x) = x,
or argue directly from the fact that w(n + 21 ) 12 (w(n + 12 + x) + w(n + 12 x)) for |x| 12 , to see that
R n+1
w(n + 21 ) n w for every n 0. Accordingly
P 1
R 1
n=m w(n + 2 ) m w = 2
.
2(1+m)

(b) Similarly,
R because I lies all on the same side of x , w is convex on I, so the same inequality yields
w (x)I I w .
(c) Let m be such that |x m| 12 . Then, using the same inequalities as before to estimate w(x n)
for n =
6 m, we have

X Z xm 12 Z
w(x n) w(x m) + w(t)dt + w(t)dt
n= xm+ 12
Z
1+ w(x)dx = 2.

(d)(i) The first step is to note that

w( 12 (1 + )) 8(1 + )3
= 8
w() (3 + )3
for every 0. Now w( + ) 4w() whenever 0 and 21 . P
P For t 12 ,
d 12t(1+)
tw(t + t) = 0,
dt (1+t+t)4
so
w( + ) 12 w( 12 + 12 ) 4w(). Q
Q
Of course this means that
1 1+
w( ) 8w()
2
whenever 0 and 0 < 1.
1+
(ii) Try C2 = 16. If 0 < 1 and 0, set = . Then, for any x ,
2
x 1
1 + x + = (1 + )(1 + ) (1 + ),
1+ 2

so w(x + ) 8w() and


R R

w(x)w(x + )dx 8w()
w 8w().
486 Fourier analysis 286G

On the other hand,


Z Z
w(x)w(x + )dx w() w(x + )dx

Z
1 1+
= w( ) w 8w().
2
R
Putting these together,
w(x)w(x + )dx 16w(); and this is true whenever 0 < 1 and 0.
(iii) If = 0, then
R R

w(x)w(x + )dx = w()
w(x)dx = w() C2 w()
for any . If 0 < 1 and < 0, then

Z Z
w(x)w(x + )dx = w(x)w(x )dx

(because w is an even function)
Z
= w(x)w(x )dx C2 w()

(by (ii) above)
= C2 w().

So we have the required inequality in all cases.


R 1/2
(e) Set C3 = max(C12 C2 , kk22 / 1/2 w).
(i) It is worth disposing immediately of the case = . In this case,
|( | )| = k k22 = kk22 ,
while
Z Z x +2k 1 Z 1/2
k
w = 2 w(2k (x x ))dx = w(x)dx,
I x 2k 1 1/2
R
so certainly |( | )| C3 I
w .

R (ii) Now suppose that I 6= I . In this case, because k k , I must all lie on the same side of x ,
so w w (x )I , by (b).
I
We know from 286E(c-iii) that | (x)| 2k /2 C1 w (x) for every x. So

Z
k /2 k /2
|( | )| 2 2 C12 w (x)w (x)dx

Z
= 2k /2 2k /2 C12 w(2k (x x ))w(2k (x x ))dx

Z
= 2k /2 2k /2 C12 w(2k k x + 2k (x x ))w(x)dx

2k /2 2k /2 C12 C2 w(2k (x x ))
(by (d), since 2k k 1)
Z
2k /2 2k /2 C3 w (x ) 2k /2 2k /2 C3 w ,
I

as required.
286I Carlesons theorem 487

P R R 1
(f ) Set C4 = 2 j=0 j+ 21 w(x)dx; this is finite because
w(x)dx = for every 0.
2(1+)2
If k < k then k 6= k for any , so the result is trivial. If k k , then for each dyadic subinterval I
of I of length 2k there is exactly
one such that
I = I. List these as 0 , . . . in ascending order of
the centres xj , so that if I = 2k m, 2k (m + 1) then xj = 2k m + 2k (j + 12 ), for j < 2kk . Now

X1 Z 2k m
2kk X1 Z 2k m
2kk

k 1
wj (x)dx = 2 w(2k (x 2k m) j )dx

2
j=0 j=0

X1 Z 0
2kk
1
= w(x j )dx

2
j=0

X Z
1
w(x)dx = C4 .
j+ 21
2
j=0

Similarly (since w is an even function, so the whole picture is symmetric about x )


P2kk 1 R 1
j=0 2k (m+1)
wj (x)dx C4 ,
2
and
P R
,k =k R\I
w C4 ,
as required.

286H Mass and energy (Lacey & Thiele 00) If P is a subset of Q, E R is measurable,
h : R R is measurable, and f L2C , set
R R
massEh (P ) = supP, Q, Eh1 [J ] w sup Q w = 1,
qP
energyf (P ) = sup Q 2k /2 P,r |(f | )|2 .

If P 0 P then massEh (P 0 ) massEh (P ) and energyf (P 0 ) energyf (P ). Note that energyf ({}) =
2k /2 |(f | )| for any Q, since if r then k k .

286I Lemma Set C5 = 212 . If P Q is finite, E R is measurable, h : R RP is measurable, and


massEh (P ), then we can find sets P1 P , P2 Q such that massEh (P1 ) 41 , P2 I C5 E
and P \ P1 4 P2 (in the notation of 286Fc).
proof (a) Set P1 = { : P , massEh ({}) 41 }. Then massEh (P1 ) 14 .R If = 0 we can stop here,
as P1 = P . Otherwise, for each P \ P1 let 0 Q be such that 0 and Eh1 [J 0 ] w0 > 41 . Let P2

be the set of elements of { 0 : P \ P1 } which are maximal for ; then P \ P1 4 P2 .
(b) For k N set
(k)
Rk = { : P2 , 2k (E h1 [J ] I ) 22k9 },
(k) S
where I is the interval with the same centre as I and 2k times its length. Now P2 = kN Rk . P
P Take
(k)
P2 . If k N and x R \ I , then |x x | 2kk 1 , so
w (x) = 2k w(2k (x x )) 2k w(2k1 ) = 2k (1 + 2k1 )3 .
So
Z Z Z
X
1
< w = w + w
4 Eh1 [J ] Eh1 [J ]I Eh1 [J ]I
(k+1)
\I
(k)
k=0

X
2k (E h1 [J ] I ) + 2k (E h1 [J ] I(k+1) )(1 + 2k1 )3 .
k=0
488 Fourier analysis 286I

It follows that either


1
2k (E h1 [J ] I )
8
and R0 , or there is some k N such that
(k+1)
2k (E h1 [J ] I )(1 + 2k1 )3 2k4
and
(k+1)
2k (E h1 [J ] I ) (1 + 2k1 )3 2k4 22k7 ,
so that Rk+1 . Q
Q
P
(c) For every k N, Rk I 211k E. P P If Rk = , this is trivial. Otherwise, enumerate Rk as
hj ijn in such a way that kj kl if j l n. Define q : {0, . . . , n} {0, . . . , n} inductively by the rule
(k) (k)
q(l) = min({l} {q(j) : j < l, (Iq(j) Jq(j) ) (Il Jl ) 6= })
for each l n. A simple induction shows that q(q(l)) = q(l) l for every l n. Note that, for l n,
(k) (k)
Iq(l) Il 6= , so that
(k) (k+2)
Il Il Iq(l) ,
(k) (k)
because Il Iq(l) . Moreover, if j < l n and q(j) = q(l), then both Jj and Jl meet Jq(j) , therefore
include it, and Jj Jl . But as j and l are distinct members of P2 , l 6 j and Ij Il must be empty.
Set M = {q(j) : j n}. We have
X X X
I = Ij
Rk mM jn
q(j)=m
X X
I(k+2)
m
= 2k+2 Im
mM mM
X
2k+2 292k (E h1 [Jm ] I(k)
m
)
mM

2k+2 292k E = 211k E


(k) (k)
because if l, m M and l < m then Il Jl and Im Jm are disjoint (since otherwise q(m) l and
(k) (k)
there can be no j such that q(j) = m), so that h1 [Jl ] Il and h1 [Jm ] Im are disjoint. Q
Q
(d) Accordingly
P P P
P2 I k=0 Rk I 212 E,
as required.

286J Lemma If P Q is finite and f L2C , then


X X
(f | )( | )( |f ) C3 |(f | )|2
, P,J =J P
X
C3 k (f | ) k2 kf k2 .
P

proof

X X 1
(f | )( | )( |f ) |(f | )|2 + |(f | )|2 |( | )|
2
, P,J =J , P,J =J
1 2 2
(because || 2 (|| + || ) for all complex numbers , )
286K Carlesons theorem 489
X X
= |(f | )|2 |( | )|
P P
J =J
X X Z
2
|(f | )| C3 w
P P,J =J I

(by 286Ge, since k = k if J = J )


X Z
|(f | )|2 C3 w
P

(because if , 0 are distinct members of P and J = J 0 , then I and I 0 are disjoint)


X X
= C3 |(f | )|2 = C3 (f | )( |f )
P P
X X
= C3 (f | ) f C3 k (f | ) k2 kf k2
P P

by Cauchys inequality (244Eb).

286K Lemma Set



C6 = 4(C3 + 4C3 2C4 ).
Let P Q be a finite set, f L2C and kf k2 = 1. Suppose that energyf (P ). Then we can find finite sets
P
P1 P and P2 Q such that energyf (P1 ) 12 , 2 P2 I C6 , and P \ P1 4 P2 .
proof (a) We may suppose that > 0 and that P 6= , since otherwise we can take P1 = P and P2 = .
(i) For Q, A Q set
P
T = { : P , r }, (A) = A |(f | )|2 .
There are only finitely many sets of the form T ; let R Q be a non-empty finite set such that whenever
Q and T is not empty, there is a 0 R such that T = T 0 and k 0 k ; this is possible because if
A P is not empty then k minA k whenever A = T .
(ii) Choose 0 , 1 , . . . , P00 , P10 , . . . inductively, as follows. P00 = P . Given that Pj0 P is not empty,
consider
1
Rj = { : R, 2k (Pj0 T ) 2 }.
4

If Rj = , stop the induction and set n = j, P2 = {l : l < j}, P1 = Pj0 . Otherwise, among the members of
0
Rj take one with y as far to the left as possible, and call it j ; set Pj+1 = Pj0 \ { : P , j }, and
continue. Note that as Rj+1 Rj for every j, yj+1 yj for every j.
The induction must stop at a finite stage because if it does not stop with n = j then (Pj0 Tj ) > 0, so
Pj0 Tj is not empty and Pj+1
0
Pj0 \ Tj is a proper subset of Pj0 , while P00 = P is finite. Since Rn = ,
p
energy(P1 ) = energy(Pn0 ) = sup 2k /2 (Pn0 T )
f f Q
p 1
= max 2k /2 (Pn0 T ) .
R 2

We also have P \ P1 4 {j : j < n}.


S
(iii) Set Pj00 = Pj0 Tj Pj0 \ Pj+1
0
for j < n, so that hPj00 ij<n is disjoint, and P 0 = j<n Pj00 P .
Then if P 0 , j < n and Jj Jl , I Ij = . P P?? Otherwise, take l < n such that Pl00 . Then
0
Jj J , so kj k and I must be included in Ij ; thus j and / Pj+1 . On the other hand,
yj Jj Jl , yl Jrl Jr ,
/ Pl0 , while we chose l such that Pl00 . X
so yj < yl and j < l. But this means that XQQ
490 Fourier analysis 286K

It follows that if , P 0 are distinct and Jl Jl is not empty, then I I = . P P If J = J this


is true just because 6= . Otherwise, since J and J intersect, one is included in the other; suppose that
J J . Since J meets Jl , J Jl . Now let j < n be such that Pj00 ; then j , so Jj J Jl ,
and I I Ij I = by the last remark. Q Q

(b) Now let us estimate

X X X
2 Ij = 2kj 2 4 (Pj00 )
j<n j<n j<n
X X X
=4 |(f | )|2 = 4 |(f | )|2 = 4
j<n Pj00 P 0

say.
P
Because kf k2 = 1, we have k P 0 (f | ) k2 (see 286J). So

X X
2 k (f | ) k22 = (f | )( | )( |f )
P 0 , P 0
X X
= (f | )( | )( |f ) + 2 (f | )( | )( |f )
, P 0 ,J =J , P 0 ,J Jl

because ( | ) = 0 unless Jl Jl 6= , as noted in 286Eb. Take these two terms separately. For the first,
we have
P

, P 0 ,J =J (f | )( | )( |f ) C3

by 286J. For the second term, we have

X X X
(f | )( | )( |f ) |(f | )| |( | )( |f )|
, P 0 ,J Jl P 0 P 0 ,J Jl
sX sX X 2
|(f | )|2 |( | )( |f )|
P 0 P 0 P 0 ,J Jl
sX

= Hj ,
j<n

where for j < n I set

X X 2
Hj = |( | )( |f )| .
Pj00 P 0 ,J Jl

Now we can estimate Hj by observing that, for any P 0 ,


|( |f )| = 2k /2 energyf ({ }) 2k /2 ,

while if , P 0 and Jl J then


R
|( | )| 2k /2 2k /2 C3 I
w

by 286Ge. We also need to know that if Pj00 and , 0 are distinct elements of P 0 such that J Jl Jl 0 ,
then I , I 0 and Ij are all disjoint, by (a-iii) above, because Jj J . So we have
286L Carlesons theorem 491

X X Z
2
Hj 2k /2 2k /2 2k /2 C3 w
Pj00 P 0 ,J Jl I

X X Z
2
= C32 2 2k w
Pj00 I
P 0 ,J Jl
X Z
C32 2 2k ( w )2
Pj00 R\Ij


X X Z Z
C32 2 2k w w
k=kj Pj00 ,k =k R\Ij


X
C32 2 2k C4
k=kj

(by 286Gf, since j for every Pj00 )


= C32 2 2kj +1 C4 .

Accordingly
P P
j<n Hj 2C32 2 C4 j<n 2kj 2C32 C4 4.
Putting these together,
X X
2 (f | )( | )( |f ) + 2 (f | )( | )( |f )
, P 0 ,J =J , P 0 ,J Jl
sX
p
C3 + 2 Hj C3 + 4C3 2C4
j<n
p 1
= C3 + 4C3 2C4 = C6 .
4

But this means that


P
2 j<n Ij 4 C6 ,
and P2 = {j : j < n} has the property required.

286L Lemma Set



28 4 14C3
C7 = C1 6 + + .
w(3/2) w(3/2)
Suppose that P is a finite subset of Q with an upper bound in Q. Suppose that E R is measurable,
h : R R is measurable and f L2C . Then
P R k
P |(f | ) Eh1 [J r ] | 2
C7 energyf (P ) massEh (P ).

proof Set = energyf (P ), 0 = massEh (P ). If P = the result is trivial, so suppose that P 6= .



(a)(i) For a dyadic interval J = 2k n, 2k (n + 1) set J = 2k (n 1), 2k (n + 2) , the half-open interval
with the same centre as J and three times its length. Let J be the family of those J I such that I 6 J
for any P such that I J. Because P is finite, all sufficiently small intervals
S S belong to J , and
J = R; let K be the set of maximal members of J , so that K is disjoint. Then K = R. P P The point
is that P 6= ; fix P for the moment. If J J , consider for each n N the interval J(n) I including
J with length 2n J. Then there is some n N such that J(n) I and I (J(n) ) , so that J(k) /J
(k) (k)
S
for
S anyS k n, and there must be some k < n such that J K. Thus J J K; as J is arbitrary,
K = J = R. Q Q
492 Fourier analysis 286L

(ii) For K K, let lK Z be such that K = 2lK . If lK k , so that K I , then K must lie
within the interval I with centre x and length 7I , since otherwise we should have I K = , where K
is the dyadic interval of length 2K including K, and K would belong to J . But this means that
P k
KK,lK k K 7I = 7 2 ,
because K is disjoint.
(iii) For any l < k , there are just three members K of K such that lK = l. P P If J I and J > I ,
then either I J or I J = , and J J iff I J is empty. This means
that if K I and K = 2l ,
K K iff I K is empty and I K . So if I 2 n, 2 (n + 1) and K = 2l m, 2l (m + 1) , we
l l

shall have K K iff


either m = n 2 or m = n + 2 or m = n 3 is even or m = n + 3 is odd;
which for any given n happens for just three values of m. Q
Q
(b) For P , let be a complex number of modulus 1 such that
R R
(f | ) Eh1 [J r ] = |(f | ) Eh1 [J r ] |.

Set W = P K. For (, K) W , set


R
K = (f | ) Eh1 [Jr ]K
.
The aim of the proof is to estimate
P R P
(f | ) = (,K)W K .
P Eh1 [Jr ]

It will be helpful to note straight away that


P P R
(,K)W |K | P |(f | )| | |

is finite.
Set
W0 = {(, K) : P , K K, k lK k },

W1 = {(, K) : P , K K, lK < k },

W2 = {(, K) : P , K K, k < lK , 6r },

W3 = {(, K) : P , K K, k < lK , r }.
Because k k for every P , W = W0 W1 W2 W3 . I will give estimates for
P
j = (,K)Wj K
for each j; the three components in the expression for C7 given above are bounds for |0 | + |1 |, |2 | and
|3 | respectively.
(c)(i) Whenever K K and P ,
|(f | )| 2k /2 ,
by 286H, and

Z Z
| | 23k /2 C1 w2
Eh1 [Jr ]K Eh1 [Jr ]K
(286E(c-iii))
Z
3k /2
2 C1 w sup w (x)
Eh1 [J ] xK
3k /2 0
2 C1 sup w (x)
xK
k /2
=2 C1 w(2k (x , K)),
0
286L Carlesons theorem 493

where I write (x , K) for inf xK |x x |. So, for fixed K K and k lK ,


X X
|K | 2k C1 0 w(2k (x , K))
P,k =k P,k =k
X
2k C1 0 2 w(n + 12 )
n=2klK

because the x , for P and k = k, are all distinct and all a distance at least 2lK from K (because
I 6 K ); so there are at most two with (x , K) = 2k (n + 21 ) for each n 2klK . So we have
P k
P,k =k |K | 2 C1 0 (1 + 2klK )2 2k2 C1 0
by 286Ga. And this is true whenever K K and k lK .
(ii) Now
X X X
|0 | |K | = |K |
(,K)W0 KK P
lK k k lK
X
X X
= |K |
KK k=lK P
lK k k =k
X X X
k2
2 C1 0 = C1 0 2lK 1
KK k=lK KK,lK k
lK k
1 X
= C1 0 K 4 2k C1 0
2
KK,lK k

by the formula in (a-ii). This deals with 0 .


(d) Next consider W1 . We have

X X
X X
|1 | |K | = |K |
(,K)W1 KK k=k P
lK <k k =k
X
X
C1 0 2k (1 + 2klK )2
KK k=k
lK <k
(by (c-i) above )
kX
1 X
X
C1 0 (1 + 2k l )2 2k
l= KK k=k
lK =l
kX
1 X
= 2k +1 C1 0 (1 + 2k l )2
l= KK
lK =l
kX
1

= 6 2k C1 0 (1 + 2k l )2
l=
(by (a-iii))

X
= 6 2k C1 0 (1 + 2l )2 2 2k C1 0
l=1

because
494 Fourier analysis 286L

P P 1 1
l=1 (1 + 2l )2 l=1 4l = .
3

This deals with 1 .


S
(e) For K K, set GK = K E P,k <lK h1 [J ]. Then GK 2 0 K/w( 32 ). P P If lK k , then
GK = , so we may suppose that lK > k . Let K I be the dyadic interval containing K and with twice
the length. Then K / J , so there is a P such that K I and I K, so that k lK 1 k .
Let Q be such that and k = lK 1 (286F(a-iv)). Then I meets K , so K is either equal
to I or adjacent to it, and |x x | 32 2k for every x K, therefore for every x K. Accordingly
w (x) 2k w( 32 ) = w( 23 )/2K
R
for every x K. On the other hand, because P and , Eh1 [J ] w 0 . So
(E h1 [J ] K) 2 0 K/w( 32 ).
Now suppose that 0 P and k0 < lK . Then J0 is the dyadic interval of length 2k0 including J .
But J is the dyadic interval of length 2k including J , so includes J0 , and h1 [J0 ] h1 [J ]. As 0 is
arbitrary, GK E h1 [J ] K and GK 2 0 K/w( 23 ), as claimed. Q Q

(f )(i) For x R, set


P
v2 (x) = (,K)W2 (f | ) (x)(E h1 [Jr ] K)(x).
Then, for any x R, either v2 (x) = 0 or there is a k k such that
P
v2 (x) = P,k =k (f | ) (x).
P If v2 (x) 6= 0, then we have a pair (, L) W2 such that x E h1 [Jr ] L. Now suppose that
P
(, K) W2 and either K 6= L or k 6= k . If K 6= L then of course x / L, because K is a disjoint family. If
k 6= k , then examine J and J . These are dyadic intervals of different lengths, and both include J . On
the other hand, neither of the right-hand halves Jr , Jr includes Jr , because 6r and 6r . So either
J Jr = (if k < k ) or J Jr = (if k > k ); in either case, Jr Jr is empty, and x / h1 [Jr ].
On the other hand, of couse, if P and k = k , then k < lL and J = J does not include Jr , so
r r

that (, L) W2 and x E h1 [Jr ] L.


Thus
P P
v2 (x) = | (,K)W2 ,xh1 [J r ]K (f | ) (x)| = | P,k =k (f | ) (x)|,

and we can set k = k . Q


Q

(ii) It follows that v2 (x) 2C1 for every x R. P


P If v2 (x) = 0 this is trivial. Otherwise, take k from
(i). Then

X X
v2 (x) |(f | ) (x)| 2k/2 2k/2 C1 w (x)
P,k =k P,k =k

(by 286H and 286E(c-iii))


X
X
k
= C1 w(2 (x x )) C1 w(2k x n 21 )
P,k =k n=

(because the x , for P and k = k, are all distinct and of the form 2k (n + 12 ))
2C1

by 286Gc. Q
Q

(iii) Note also that, if v2 (x) > 0, there is a pair (, K) W2 such that x h1 [J ] K, so that
k k < lK and x GK . But now we have
286L Carlesons theorem 495

X Z

|2 | = (f | ) (E h1 [Jr ] K)
(,K)W2
Z X Z X
v2 = v2 4C1 0 K/w( 23 )
KK,lK >k GK KK,lK >k

(putting the estimates in (e) and (ii) just above together)


28 2k C1 0 /w( 32 )

by (a-ii). This deals with 2 .


P
(g) Set P 0 = { : P , r } and g = P 0 (f | ) . Then
kgk22 2k C3 2 .
P If , 0 P 0 and k 6= k0 , then ( |0 ) = 0 (286Fb). While if k = k0 , then J = J0 , because and
P
0 have a common upper bound . So

X
kgk22 = (f | )( |0 )(0 |f )0
, 0 P 0
X X
(f | )( |0 )(0 |f ) C3 |(f | )|2
, 0 P 0 ,J =J0 P 0

(by 286J)
C3 2k 2

by the definition of energy. Q


Q
(h) For m N, set
P
gm = P 0 ,k m (f | ) .

Then whenever x, x0 R and |x x0 | 2m , |gm (x)| 12 C1 g (x0 ), where


1 Rb
g (x0 ) = supax0 b,a<b a
|g|
ba

P (i) Since k k for every P 0 , we may take it that m k . Let J be the dyadic interval
as in 286A. P

of length 2m including J , and y its midpoint. Set = Sy D2m /3 , that is, (y) = ( 13 2m (y y)) for
y R.
(ii) If P 0 and k m and (y) 6= 0, then y J . But J J J is not empty, so J J,

|y y| 12 2m , | 13 2m (y y)| 61 and (y) = 1.

(iii) If P 0 and k > m and (y) 6= 0, then Jr J Jr is non-empty, so J Jr and y y y;


now
1 1 3 1 1
y y = (y y ) + (y y) 2m + 2k 2m , | 2m (y y)|
2 20 5 3 5

and (y) = 0.
(iv) What this means is that if P 0 then

= if k m,
= 0 if k > m,

so that g m = g.
496 Fourier analysis 286L

1
(v) By 283M, gm = g , where g is the convolution of g and the inverse Fourier transform
2
1
of . (Strictly speaking, 283M, with the help of 284C, tells us that gm and g have the same Fourier
2
transforms. By 283G, they are equal almost everywhere; by 255K, the convolution is defined everywhere
and is continuous; so in fact they are the same function.) Now

= 3 2m My D32m = 3 2m My D32m ,
that is,

(x) = 3 2m eixy (3 2m x)
for y R.

(vi) Set (x) = min(w(3), w(x)) for x R, so that is non-decreasing on ], 3], non-increasing on
[3, [, and constant on [3, 3], and |(x)| C1 (x) for every x, by the choice of C1 (286Ec). Take any x,
x0 R such that |x x0 | 2m . Then

Z Z
m
1 32
|gm (x)| |g(x t)||(t)|dt = |g(x t)||(3 2m t)|dt
2
2
Z Z
m
32 32m
C1 |g(x t)|(3 2m t)dt = C1 |g(x + t)|(3 2m t)dt
2
2
(because is an even function)
Z Z b
m
32 1
C1 (3 2m t)dt sup |g(x + t)|dt
2 a2m ,b2m ba a

(by 286B, because t 7 (3 2m t) is non-decreasing on ], 2m ], non-increasing on [2m , [ and constant


on [2m , 2m ])
Z Z b
1 1
= C1 (t)dt sup |g(t)|dt
2 ax2 m ,bx+2 m ba a
Z
1
C1 w(t)dt g (x0 )
2

(because if a x 2m and b x + 2m then a x0 b)


1
= C1 g (x0 )
2

as required. Q
Q

(i) For x R, set


P
v3 (x) = (,K)W3 (f | ) (x)(E h1 [Jr ] K)(x).
Then whenever L K and x, x0 L, |v3 (x)| C1 g (x0 ). P P We may suppose that v3 (x) 6= 0, so that, in
particular, x E. The only pairs (, K) contributing to the sum forming v3 (x) are those in which x K,
so that K = L, and h(x) Jr . Moreover, since we are looking only at such that r , so that Jr Jr ,
Jr will always be the dyadic interval of length 2k 1 including Jr . So these intervals are nested, and there
will be some m such that (for r ) h(x) Jr iff k m. Accordingly
P
v3 (x) = 0 (f | ) (x) = |glL 1 (x) gm1 (x)|
P ,mk <lL

(we must have m < lL because v3 (x) 6= 0). Now |x x0 | 2lL +1 2m+1 , so (h) tells us that both
|glL 1 (x)| and |gm1 (x)| are at most 21 C1 g (x0 ), and v3 (x) C1 g (x0 ), as claimed. Q
Q
C1 R
It follows that v3 (x) L
g for every x L.
L
286M Carlesons theorem 497

(j) Now we are in a position to estimate

X Z X Z
|3 | = | K | v3 v3
(,K)W3 KK,lK >k GK

(because if v3 (x) 6= 0 there are (, K) W3 such that x K, and in this case x GK and lK > k k )
X Z
C1
GK g
K K
KK,lK >k
(by (i) above, because GK K)
X Z
2 0
C1 g
KK,lK >k
w( 23 ) K

(by (e))
Z
2C1 0
g
w( 23 ) I
(because if lK as noted in (a-ii))
> k then K I,
q
2C1 0
I kg k2
w( 23 )
(by Cauchys inequality)
2C1 0
3 7 2k /2 8kgk2
w( 2 )
(by the Maximal Theorem, 286A)

4C1 0 14 k /2 k /2 p
2 2 C3
w( 32 )
(by (g))

4C1 14C3 0 k
= 2 .
w( 32 )

(k) Assembling these,

X Z X 3
X X 3
X

(f | ) = K = K |j |
P Eh1 [Jr ] P,KK j=0 (,K)Wj j=0

4 2k C1 0 + 2 2k C1 0 + 28 2k C1 0 /w( 23 )
p
+ 4 14C3 2k C1 0 /w( 23 )
= 2k C7 0 ,
as claimed.

286M The Lacey-Thiele lemma Set C8 = 3C7 (C5 + C6 ). Then


P R
Q |(f | ) Eh1 [J r ] | C8

whenever f L2C , kf k2 = 1, E 1 and h : R R is measurable.


proofp (a) The first step is to combine 286I and 286K, as follows: if P Q p is finite and
max( massEh (P ), energyf (P )) , there are P 0 P and R Q such that max( massEh (P 0 ),
P
energyf (P 0 )) 12 , 2 R 2k C5 + C6 , and P \ P 0 4 R. P P Since massEh (P ) 2 , 286I tells
P
us that there are P0 P , R0 Q such that massEh (P0 ) 14 2 , 2 R0 2k C5 , and P \ P0 4 R0 .
498 Fourier analysis 286M

Now turn to 286K: since energyf (P0 ) energyf (P ) , we can find P 0 P0 and R1 Q such that
P
energyf (P 0 ) 12 , 2 R1 2k C6 , and P0 \ P 0 4 R1 . Now massEh (P 0 ) massEh (P0 ) 41 2 ,
p P
so max( massEh (P 0 ), energyf (P 0 )) 12 ; and setting R = R0 R1 , 2 R 2k C5 + C6 , while
P \ P 0 4 R. Q Q
p
(b) Now take any finite P Q. Let k N be such that max( massEh (P ), energyf (P )) 2k . By (a),
we can choose hPn inN , hRn inN inductively such that P0 = P and, for each n N,
Pn+1 Pn , Pn \ Pn+1 4 Rn ,
p P
max( massEh (Pn ), energyf (Pn )) 2kn , 22k2n Rn 2k C5 + C6 .
T
Since energyf ({}) = 2k /2 |(f | )| > 0 whenever (f | ) 6= 0 (286H), (f | ) = 0 whenever nN Pn ,
and

X Z X Z

(f | ) = (f | )
Eh1 [Jr ] S Eh1 [Jr ]
P nN Pn \Pn+1

X X Z

= (f | )
n=0 Pn \Pn+1 Eh1 [Jr ]

X X
X Z

(f | )
n=0 Rn Pn Eh1 [Jr ]

X X
k
2 C7 energy(Pn ) mass(Pn )
f Eh
n=0 Rn
(by 286L)

X
C7 (C5 + C6 )22n2k 2kn min(1, 22k2n )
n=0
(because massEh (Pn ) 1 for every n, as noted in 286H )

X
= C7 (C5 + C6 ) min(2nk , 2kn )
n=0
X
C7 (C5 + C6 ) min(2n , 2n ) = 3C7 (C5 + C6 ).
n=

(c) Since this true for every finite P Q,


P R
Q |(f | ) Eh1 [J r ] | C8 ,

as claimed.

286N Lemma Set C9 = C8 2. Suppose that f L2C , h : R R is measurable and F < . Then
P R
Q |(f | ) F h1 [J r ] | C9 kf k2 F .

proof This is trivial if kf k2 = 0, that is, f =a.e. 0. So we may take it that kf k2 > 0. Dividing both sides
by kf k2 , we may suppose that kf k2 = 1.
Let k Z be such that 2k1 < F 2k . We have a bijection 7 : Q Q defined by saying that
= (2k I , 2k J ); so that k = k + k, x = 2k x , yl = 2k yl , Jr = 2k Jr , and for every x R

k l
(2k x) = 2k /2 e2 iy x
(2k (2k x x ))
l
= 2k /2 e iy x
(2k +k (x 2k x ))
l
= 2k /2 eiy x (2k (x x )) = 2k/2 (x).
286O Carlesons theorem 499

Write F = 2k F , so that F 1, and h(x) = 2k h(2k x) for every x. Then, for Q,

F h1 [Jr ] = {x : x F, h(x) Jr } = {x : 2k x F , 2k h(2k x) Jr }


= {x : 2k x F , h(2k x) Jr } = 2k {x : x F , h(x) Jr }.
Write f(x) = 2k/2 f (2k x), so that
kfk2 = 2k/2 kD2k f k2 = kf k2 = 1,
while
R R
(f | ) =
f (x) (x)dx = 2k
f (2k x) (2k x)dx = (f| )
for every Q. Putting all these together,

X Z X Z

(f | )
= 2 k (f | ) (2k x)dx
Q F h1 [Jr ] Q 2k (F h1 [Jr ])
X Z

=2 k/2 (f| )
Q F h1 [Jr ]
X Z

= 2k/2 (f| )
Q F h1 [Jr ]

2k/2 C8
(by the Lacey-Thiele lemma, applied to h, F and f)
p p
C9 F = C9 kf k2 F .

286O Lemma Suppose that f L2C . For x R, set


P
(Af )(x) = supzR Q,zJr |(f | ) (x)|.
R
Then Af : R [0, ] is Borel measurable, and F Af C9 kf k2 F whenever F < .
P
proof (a) For z R and finite P Q, set gP z = P,zJr |(f | ) |, so that
Af (x) = sup{gP z (x) : P Q is finite, z R}.
Because every gP z is continuous, Af is Borel measurable and
Z Z
Af (x)dx = sup{ max(gP0 z0 (x), . . . , gPn zn (x))dx :
F F
P0 , . . . , Pn Q are finite, z0 , . . . , zn R}
for every measurable set F (256M).
(b) Given finite sets P0 , . . . , Pn Q and z0 , . . . , zn , x R, set
g(x) = maxjn gPj zj (x), l(x) = min{j : j n, g(x) = gPj zj (x)}, h(x) = zl(x) ;
because every gPj zj is continuous, l : R {0, . . . , n} and h : R R are measurable. Now
Z Z Z X
g(x)dx = gPl(x) ,h(x) (x)dx = |(f | ) (x)|dx
F F F P r
l(x) ,h(x)J
Z X
|(f | ) (x)|dx
F Q,h(x)J r

XZ p
= |(f | ) | C9 kf k2 F
Q F h1 [Jr ]
R
by 286N. Since P0 , . . . , Pn and z0 , . . . , zn are arbitrary, F
Af C9 kf k2 F .
500 Fourier analysis 286P

286P Lemma For any z R, define z : R [0, 1] by setting



z (y) = (2k (y y))2

whenever there is a dyadic interval J I of length 2k such that z belongs to the right-hand half of J and
y belongs to the left-hand half of J and y is the lower quartile of J, and zero if there is no such J. Then

0 z (y) 1 for every y R, z (y) = 0 if y z, and 2|(f z ) | Af for any rapidly decreasing test
function f .

proof (a) I had better start by explaining why the recipe above defines a function z . Let M be the set of
those k Z such that z belongs to the right-hand half of the dyadic interval Jk of length 2k containing z.

For k M , let yk be the midpoint of the left-hand half Jkl of Jk , and set k (y) = (2k (y yk ))2 for y R;
then k is smooth and zero outside Jkl . But now observe that if k, k 0 are distinct members of M , then Jkl
P
and Jkl 0 are disjoint, as observed in 286E(b-iii). So z is just the sum kM k . Because takes values in
/ Jl for any k M , so z (y) = 0.
[0, 1], so does z . If y z, then of course y k

(b) Fix a rapidly decreasing


S test function
P Pf . For k M , set Rk = {(I, Jk ) : I I, I = 2k }, so that
r
{ : Q, z J } = kM Rk , and kM Rk |(f | ) (x)| (Af )(x).
P
Now, for k M , 2(f k ) = Rk (f | ) . P P If Rk , yl = yk and x is of the form 2k (n + 21 )
for some n Z. So

Z
(f | ) = f (t) (t)dt

(284O)
Z k
i(n+ 12 )(tyk )
= f (t) 2k/2 e2 (2k (t yk ))dt


(by the formula in 286Eb, because is real-valued)
Z
1
= 2k/2 f (2k t + yk )ei(n+ 2 )t (t)dt

Z
1
k/2
=2 f (2k t + yk )ei(n+ 2 )t (t)dt


(because (t) = 0 if |t| 15 )
Z
= 2k/2 g(t)eint dt,

1 R
where g(t) = f (2k t + yk )eit/2 (t) for < t . So if we set cn = g(t)eint dt, as in 282A, we have
2

(f | ) = 2k/2 2cn
P
when Rk and x = 2k (n + 21 ). Note that as g is smooth and zero outside [ 15 , 51 ], n= |cn | <
(282R).
Now, for any y Jkl ,
286P Carlesons theorem 501

X
X
k
i(n+ 12 )(yyk )
(f | ) (y) = 2k/2 2cn 2k/2 e2 (2k (y yk ))
Rk n=

X
k1 k
= 2 (2k (y yk ))e2 i(yyk )
cn e2 in(yyk )

n=
X
k1 k
= 2 (2k (y yk ))e2 i(yyk )
cn ein2 (yyk )

n=
k1
= 2 (2k (y yk ))e2 i(yyk )
g(2k (y yk ))
1
(by 282L, because 2k |y yk | 4 < )
k1 k1
= 2e2 i(yyk )
(2k (y yk ))f (y)e2 i(yyk )
(2k (y yk ))

= 2 f (y)k (y).

P
On the other hand, if y R \ Jkl , k (y) = (y) = 0 for every Rk , so again Rk (f | ) (y) =

2 f (y)k (y).
Next,
P P
Rk |(f | )| = 2 2k/2 n= |cn |
and
R R
supRk
| (y)|dy = 2k/2
|(y)|dy
P R
are finite. So Rk |(f | ) | is finite. Accordingly, for any x R,

X
2(f k ) (x) = (f | ) (x)
Rk
Z X
1
= (f | ) (y)eixy dy
2 R
k

1 X Z
= (f | ) (y)eixy dy
2
Rk
(226E)
1 X 1 X
= (f | )( ) (x) = (f | ) (x)
2 2
Rk Rk

by 284C. Q
Q
P
(c) Because every k is non-negative, z = kM k is bounded above by 1, and f is integrable,

Z
1
2(f z ) (x) = 2 eixy f (y)z (y)dy
2
Z X ixy
= 2 e f (y)k (y)dy
kM
XZ
= 2 eixy f (y)k (y)dy
kM

(by 226E again)


502 Fourier analysis 286P

X X X
= 2 (f k ) (x) = (f | ) (x),
kM kM Rk

and
P P
2|(f z ) (x)| kM Rk |(f | ) (x)| (Af )(x).

0
286Q Lemma For > 0 and y, z, R, set z (y) = z+ (y + ). Then
0 3
(a) the function (, , y, z) 7 z (y) : ]0, [ R [0, 1] is Borel measurable;
(b) for any rapidly decreasing test function f ,

0
2|(f z ) | D1/ A(M D f )
(in the notation of 286C) at every point.
proof (a) We need only observe that (y, z) 7 z (y) : R 2 R is Borel measurable, and that (, , y, z) 7
0
z (y) is built up from this, + and .
0
(b) Set v = z + , so that z = D S v . Then

0
f z = f D S v = D S (S D1/ f v )
= D S (S (D f ) v ) = D S ((M D f ) v ),
so

0

(f z ) = D S ((M D f ) v )

= D1/ S ((M D f ) v )

= D1/ M (M D f ) v
and

0
2|(f z ) | = 2D1/ (M D f ) v D1/ A(M D f )
by 286P.

286R Lemma For any y, z R,


R2 1 1 Rn 0

z (y) = 1
limn 0
z (y)d d
n
is defined, and

z (y) = 1 (0) > 0 if y < z,


= 0 if y z.

proof (a) The case y z is trivial; because if y z then y + z + for all > 0 and R, so that
0
z (y) = 0 for every > 0, R and z (y) = 0. For the rest of the proof, therefore, I look at the case
y < z.
0 0
(b)(i) Given y < z R and > 0, set l = blog2 (20(z y))c. Then z,,+2 l (y) = z (y) for every
0
R. P
P If z (y) = z+ (y + ) is non-zero, there must be k, m Z such that
1
2k (m + ) z + < 2k (m + 1)
2
and
1
(2k (y + ) (m + ))2 = z
0
(y) 6= 0,
4
so
286R Carlesons theorem 503

9
2k m y + 2k (m + )
20
1
because is zero outside [ 51 , 51 ]. In this case, 2k < (z y), so that k l. We therefore have
20
1
2k (m + 2lk + ) z + + 2l < 2k (m + 2lk + 1),
2

1
2k (m + 2lk ) y + + 2l < 2k (m + 2lk + ),
2
so
0

k 1
z,,+2 l (y) = (2 (y + + 2l ) (m + 2lk + ))2 = z
0
(y).
4

Similarly,
1
2k (m 2lk + ) z + 2l < 2k (m 2lk + 1),
2

1
2k (m 2lk ) y + 2l < 2k (m 2lk + ),
2
so
0

k 1
z,,2 l (y) = (2 (y + 2l ) (m 2lk + ))2 = z
0
(y).
4
0 0
What this shows is that z,,+2l (y) = z (y) if either is non-zero, so we have the equality in any case. Q
Q

1 Rb 0
(ii) It follows that g(, y, z) = limb (y)d is defined. P
P Set
b 0 z
R 2l
= 2l 0
0
z (y)d.
From (i) we see that
R 2l (m+1)
= 2l 2l m
0
z (y)d
for every m Z, and therefore that
1 R 2l m 0
= 0
z (y)d
2m l

for every m 1. Now z (y) is always greater than or equal to 0, so if 2l m b 2l (m + 1) then

Z 2l m Z b
m 1 0 1 0
= z (y)dy z (y)d
m+1 2l (m+1) 0
b 0
Z 2l (m+1)
1 0 m+1
z (y)dy = ,
2l m 0
m

which approach as b . Q
Q

0 1 Rn 0
(c) Because (, ) 7 z (y) is Borel measurable, each of the functions 7 (y)dy, for n 1,
n 0 z
is Borel measurable (putting 251L and 252P together), and 7 g(, y, z) : ]0, [ R is Borel measurable;
0
at the same time, since 0 z (y) 1 for all and , 0 g(, y, z) 1 for every , and z (y) =
R2 1
1
g(, y, z)d is defined in [0, 1].

(d) For any y < z, R and > 0, g(, y + , z + ) = g(, y, z). P


P It is enough to consider the case
0. In this case
504 Fourier analysis 286R

Z b
1 0
g(, y + , z + ) = lim z+,, (y + )d
b b 0
Z b
1
= lim z++ (y + + )d
b b 0
Z b+
1
= lim z+ (y + )d
b b
Z b+
1 0
= lim z (y)d,
b b
so
Z b+ Z
1
|g(, y + , z + ) g(, y, z)| = lim 0
z (y)d 0
z (y)d
b b b 0
2
lim = 0. Q
Q
b b

It follows that whenever y < z and R,


R1 1 R1 1
z+ (y + ) = 0
g(, y + , z + )d = 0
g(, y, z)d = z (y).

(e) The next essential fact to note is that 2z (2y) is always equal to z (y). P
P If z (y) 6= 0, then (as in
(b) above) there are k, m Z such that
1 1 1
2k (m + ) z < 2k (m + 1), 2k m y 2k (m + ), z (y) = (2k y (m + ))2 .
2 2 4
In this case,
1 1
2k+1 (m + ) 2z < 2k+1 (m + 1), 2k+1 m 2y 2k+1 (m + ),
2 2
so
1
2z (2y) = (2k1 2y (m + ))2 = z (y).
4
Similarly,
1 1 1 1
2k1 (m + ) z < 2k1 (m + 1), 2k1 m y 2k1 (m + ),
2 2 2 2
so
1 1 1
21 z ( y) = (2k+1 y (m + ))2 = z (y).
2 2 4
This shows that 2z (2y) = z (y) if either is non-zero, and therefore in all cases. Q
Q
Accordingly
0 0
z,2,2 (y) = 2z+2 (2y + 2) = z+ (y + ) = z (y)
for all y, z, R and all > 0.
(f ) Consequently
Z b Z b/2
1 0 2 0
g(2, y, z) = lim z,2, (y)d = lim z,2,2 (y)d
b b 0 b b 0
Z b/2 Z b
2 0 1 0
= lim z (y)d = lim z (y)d = g(, y, z)
b b 0 b b 0

whenever > 0 and y, z R. It follows that


R 1 R 1 R 2 1

g(, y, z)d =
g(2, y, z)d = 2
g(, y, z)d
286S Carlesons theorem 505

whenever 0 < , and therefore that


R 2 1 R2 1

g(, y, z)d = 1
g(, y, z)d

P Take k Z such that 2k < 2k+1 . Then


for every > 0. P
Z 2 Z 2k+1 Z Z 2
1 1 1 1
g(, y, z)d = g(, y, z)d g(, y, z)d + g(, y, z)d

2k
2 k 2 k+1
Z 2k+1 Z 2
1 1
= g(, y, z)d = g(, y, z)d. Q
Q
2k
1

(g) Now if , > 0 and y < z,


1 Rb
g(, y, z) = limb z+ (y + )d = g(, y, z).
b 0
So if > 0 and y < z,
Z 2 Z 2
1 1
z (y) = g(, y, z)d = g(, y, z)d
1
1

Z 2 Z 2
1 1
= g(, y, z)d = g(, y, z)d = z (y).

1

Putting this together with (d), we see that if y < z then


z (y) = zy (0) = 1 (0).

(h) I have still to check that 1 (0) is not zero. But suppose that 1 < 67 and that there is some m Z
1 5
such that 2(m + 12 ) 2(m + 12 ). Then 2(m + 12 ) + < 2(m + 1), while | 12 (m + 14 )| 61 , so
1 1
+ () = ( (m + ))2 = 1.
2 4

What this means is that, for 1 < 76 ,


Z 2m
1
g(, 1, 0) = lim + ()d
m 2m 0
m1
X
1 1 5 1
lim [2(j + ), 2(j + )] = .
m 2m 12 12 3
j=0

So
R2 1 1 R 7/6 1
1 (0) = 1
g(, 1, 0)d 1
d > 0.
3
This completes the proof.

286S Lemma Suppose that f L2C .


(a) For every x R,
1 R2 1 Rn
(Af )(x) = lim inf n 1 0
(D1/ AM D f )(x)dd
n

is defined
R in [0, ], andAf : R [0, ] is Borel measurable.
(b) F Af C9 kf k2 F whenever F < .

(c) If f is a rapidly decreasing test function and z R, 2|(f z ) | Af at every point.
proof (a) The point here is that the function
(, , x) 7 (D1/ AM D f )(x) : ]0, [ R 2 [0, ]
506 Fourier analysis 286S

is Borel measurable. P
P

x
(D1/ AM D f )(x) = (AM D f )( )

X x
= sup |(M D f | ) ( )|.
zR,P Q is finite P,zJ r

Look at the central term in this formula. For any Q, we have


Z
(M D f | ) = eit f (t) (t)dt

Z
1
= eit/ f (t) (t/)dt.

2
Now is a rapidly decreasing test function, so there is some 0 such
that | (t)| /(1 + t ) for every
1
t R. This means that if > 0 and hn inN is a sequence in 2 , and we set g(t) = supnN | (t/n )|
for t R, then g(t) 4/(4 + t2 ) for every t and g is integrable. So Lebesgues Dominated Convergence
Theorem tells us that if hn inN and hn inN ,
1 R 1 R

ein t/n f (t) (t/n )dt
eit/ f (t) (t/)dt.
n

Thus
(, ) 7 (M D f | ) : ]0, [ R R
is continuous; and this is true for every Q.
Accordingly
P x
(, , x) 7 P,zJr |(M D f | ) ( )|

is continuous for every z R and every finite P Q, and (, , x) 7 (D1/ AM D f )(x) is Borel measurable
by 256Ma. Q Q
It follows that the repeated integrals
R2 1 Rn
1 0
(D1/ AM D f )(x)dd

are defined in [0, ] and are Borel measurable functions of x (252P again), so that Af is Borel measurable.

(b) For any n N,

Z Z 2 Z n
1 1
(D1/ AM D f )(x)dddx
F
n 1
0
Z 2 Z nZ
1 1
= (D1/ AM D f )(x)dxdd
n 1
0 F
(by Fubinis theorem, 252H)
Z 2Z nZ
1 1 x
= (AM D f )( )dxdd
n 1 0 F

Z 2Z n Z
1
= (AM D f )(x)dxdd
n 1 0 1 F
Z 2Z n p
1
C9 kM D f k2 (1 F )dd
n 1 0
(286O)
286S Carlesons theorem 507

Z 2Z
1 p
n
1 1
= C9 kf k2 F dd
n 1 0

p Z 2 Z n
1 1
= C9 kf k2 F dd
n 1
0
p p
= C9 kf k2 F ln 2 C9 kf k2 F .

So

Z Z Z 2 Z n
1 1
Af = lim inf (D1/ AM D f )(x)dddx
F F n n 1
0
Z Z 2 Z n
1 1
lim inf (D1/ AM D f )(x)dddx
n F
n 1
0
(by Fatous lemma)
p
C9 kf k2 F .

(c) For any x R,


R R2 1 1 Rn 0
R


|f (y)| 1
supnN 0
z (y)d ddy ln 2
|f (y)|dy
n
is finite. So

Z
1
(f z ) (x) = eixy f (y)z (y)dy
2
Z Z 2 Z n
1 1 1
= eixy f (y) lim 0
z (y)dddy
2 1
n n 0
Z Z 2 Z n
1 1
= lim eixy f (y) 0
z (y)dddy
2 n 1
n 0
(by Lebesgues Dominated Convergence Theorem)
Z 2 Z n Z
1 1
= lim 0
eixy f (y)z (y)dydd
2 n 1 n 0
(by Fubinis theorem)
Z 2 Z n
1
0
= lim (f z ) (x)dd,
n 1 n 0

and

Z 2 Z n
1
2|(f z ) (x)| = 2 lim 0
(f z ) (x)dd
n 1 n 0
Z 2 Z n
1
0
2 lim inf |(f z ) (x)|dd
n 1
n 0
Z 2 Z n
1
lim inf (D1/ AM D f )(x)dd
n 1
n 0
(286Qb)
= (Af )(x).
508 Fourier analysis 286T

286T Lemma Set C10 = C9 / 1 (0). For f L2C , define Af : R [0, ] by setting
1 Rb
(Af )(y) = supab | a eixy f (x)dx|
2
R
for each y R. Then F
Af C10 kf k2 F whenever F < .

1 R b ixy
proof (a) As usual, the first step is to confirm that Af is measurable. P
P For a b, y 7 | e f (x)dx|
2 a
is continuous (by 283Cf, since f [a, b] is integrable), so 256M gives the result. Q
Q

(b) Suppose that f is a rapidly decreasing test function. Then


1
(Af )(y) (Af )(y)
1 (0)

for every y R. P
P If a R then

Z a Z
1 1
| eixy f (x)dx| = | eixy a (x)f (x)dx|
2 1 (0) 2

(286R)
1 1
= |(f a ) (y)| = |(f a ) (y)|
1 (0) 1 (0)
(284C)
1
(Af )(y)
2 1 (0)

(286Sc). So if a b in R,
1 R b ixy 1
| e f (x)dx| (Af )(y);
2 a 1 (0)

taking the supremum over a and b, we have the result. Q


Q
It follows that

Z Z p
1 1
Af Af C9 kf k2 (F )
F
1 (0) F
1 (0)

(286Sb, 284Oa)
p
= C10 kf k2 F .

(c) For general square-integrable f , take any > 0 and any n N. Set
1 Rb
(An f )(y) = supnabn | a eixy f (x)dx|
2

for each y R. Let g be a rapidly decreasing test function such that kf gk2 (284N). Then

2n
Ag An g An f
2

(using Cauchys inequality), so


r r
R R n n
F
An f F
Ag + F C10 (kf k2 + ) F + F .

R R
As is arbitrary, F
An f C10 kf k2 F ; letting n , we get F Af C10 kf k2 F .
286U Carlesons theorem 509

286U Theorem If f L2C then


1 R b ixy
g(y) = lima,b e f (x)dx
2 a
is defined in C for almost every y R, and g represents the Fourier transform of f .
proof (a) For n N, y R set
1 R b ixy Rn
n (y) = supan,bn a
e f (x)dx n eixy f (x)dx.
2
Then g(y) is defined whenever inf nN n (y) = 0. P P If inf nN n (y) = 0 and > 0, take m N such that
1 1 R b ixy R n ixy
m (y) 2 ; then | e f (x)dx n e f (x)dx| whenever n m and a n, b n. But
2 a
R n ixy
this means, first, that h n e f (x)dxinN is a Cauchy sequence, so has a limit say, and, second, that
R b ixy
= lima,b a e f (x)dx, so that g(y) = is defined. Q Q
2
Also each n is a measurable function (cf. part (a) of the proof of 286T).
(b) ?? Suppose, if possible, that {y : inf nN n (y) > 0} is not negligible. Then
1
limm {y : |y| m, inf nN n (y) } > 0,
m
so there is an > 0 such that
1
F = {y : |y| , inf nN n (y) }

has measure greater than . Let n N be such that
2
R Rn
4C10 ( |f (x)|2 dx n |f (x)|2 dx) < 3 ,
and set f1 = f f [n, n]; then 2C10 kf1 k2 3/2 .
We have
Z b Z n
1
n (y) = sup eixy f1 (x)dx eixy f1 (x)dx
an,bn 2 a n
Z b
1
2 sup | eixy f1 (x)dx| 2(Af1 )(y),
ab 2 a

so that
Z Z p
F n 2 Af1 2C10 kf1 k2 F
F F
(286T)
p
3/2 F
and F ; but we chose so that F would be greater than . X
X
(c) Thus g(y) is defined for almost every y R. Now g represents the Fourier transform of f . P P Let h be
a rapidly decreasing test
R function. Then the restriction of Af to the set on which it is finite is a tempered
function, by 286D, so (Af )(y)|h(y)|dy is finite, by 284F. Now
Z Z Z n
1
gh= lim eixy f (x)dx h(y)dy

2 n n
Z Z n
1
= lim eixy f (x)h(y)dxdy
2 n n
1 R n ixy
(because | e f (x)dx| Af (y) for every n and y, so we can use Lebesgues Dominated Convergence
2 n
Theorem)
510 Fourier analysis 286U
Z n Z
1
= lim eixy f (x)h(y)dydx
2 n n
R Rn
(because n
|f (x)h(y)|dxdy is finite for each n)
Z n Z

= lim f (x)h(x)dx = f (x)h(x)dx
n n


because f h is certainly integrable. As h is arbitrary, g represents the Fourier transform of f . Q
Q

286V Theorem If f L2C (],] ) then its sequence of Fourier sums converges to it almost everywhere.

proof Set f1 (x) = f (x) for x dom f , 0 for x R \ ], ]; then f1 L2C (). Let g L2C () represent
1 R a ixy
the inverse Fourier transform of f1 (284O). Then 286U tells us that f2 (x) = lima a
e g(y)dy is
2
defined for almost every x, and that f2 represents the Fourier transform of g, so is equal almost everywhere
to f1 (284Ib).
Now, for any a 0, x R,

Z a
eixy g(y)dy = (g|hax )
a
ixy
(where hax (y) = e if |y| a, 0 otherwise)

= (f2 |hax )
(284Ob)
Z Z
1
= f2 (t) eity hax (y)dy dt
2
Z Z
2 sin(xt)a 2 sin(xt)a
= f2 (t)dt = f (t)dt.
2
xt 2
xt

So
1 R sin(xt)a
f (x) = f2 (x) = lima
f (t)dt
xt

for almost every x ], ].


On the other hand, writing hsn inN for the sequence of Fourier sums of f , we have, for any x ], [,
1 R sin(n+ 12 )(xt)
sn (x) = f (t) dt
2 sin 12 (xt)

for each n, by 282Da. Now


Z Z
1 sin(n+ 21 )(xt) 1 sin(n+ 12 )(xt)
f (t) dt f (t) dt
2
sin 12 (xt)
xt
Z
1 sin(n+ 12 )(xt) sin(n+ 12 )(xt)
= f (t) 1 dt dt

2 sin 2 (xt) xt
Z x+
1 1 1
= f (x t) sin(n + 12 )t dt.
x
2 sin 12 t t

But if we look at the function

1 1
px (t) = f (x t) if x < t < x + and t 6= 0,
2 sin 12 t t
= 0 otherwise,
286 Notes Carlesons theorem 511

1 1 1
px is integrable, because f is integrable over ], ] and limt0 = 0, so supt6=0,xtx+ |
2 sin 12 t t 2 sin 12 t
1
| is finite. (This is where we need to know that |x| < .) So
t
Z Z
1 sin(n+ 12 )(xt)
lim sn (x) f (t) dt = lim px (t) sin(n + 12 )t dt = 0
n
xt n

by the Riemann-Lebesgue lemma (282Fb). But this means that limn sn (x) = f (x) for any x ], [
1 R sin(xt)a
such that f (x) = lima
f (t)dt, which is almost every x ], ].
xt

286W Glossary The following special notations are used in more than one paragraph of this section:
for Lebesgue measure on R. 286G: C2 , C3 , C4 . 286O: Af .
286A: f . 286H: mass, energy. 286P: z (y).
0
286C: S f , M f , D f . 286I: C5 . 286Q: z (y).
286Ea: I, Q, I , J , k , x , y , Jl , Jr , yl . 286K: C6 . 286R: z (y).
286Eb: , , (f |g). 286L: C7 . 286S: Af .
286Ec: w, w , C1 . 286M: C8 . 286T: C10 , Af .
286F: , r , 4. 286N: C9 .

286X Basic exercises (a) Use 284Oa and 284Xf to shorten part (c) of the proof of 286U.
P P
(b) Show that if hck ikN is a sequence of complex numbers such that k=0 |ck |2 is finite, then k=0 ck eikx
is defined in C for almost all x R.

286Y Further exercises (a) Show that if f is a square-integrable function on R r , where r 2, then
1 Rb
g(y) = lim1 ,... ,r ,1 ,... ,r a
eiy . x f (x)dx
( 2)r
is defined in C for almost every y R r , and that g represents the Fourier transform of f .

286 Notes and comments This is not quite the longest single section in this treatise as a whole, but it
is by a substantial margin the longest in the present volume, and thirty pages of sub-superscripts must tax
the endurance of the most enthusiastic. You will easily understand why Carlesons theorem is not usually
presented at this level. But I am trying in this book to present complete proofs of the principal theorems,
there is no natural place for Carlesons theorem in later volumes as at present conceived, and it is (just)
accessible at this point; so I take the space to do it here.
The proof here divides naturally into two halves: the combinatorial part in 286E-286M, up to the
Lacey-Thiele lemma, followed by the analytic part in 286N-286V, in which the averaging process
R2 1 1 Rb
1
limb 0
. . . dd
b
is used to transform the geometrically coherent, but analytically irregular, functions z into the characteristic
1
functions z . From the standpoint of ordinary Fourier analysis, this second part is essentially routine;
1 (0)
there are many paths we could follow, and we have only to take the ordinary precautions against illegitimate
operations.
Carleson (Carleson 66) stated his theorem in the Fourier-series form of 286V; but it had long been
understood that this was equiveridical with the Fourier-transform version in 286U. There are of course many
ways of extending the theorem. In particular, there are corresponding results for functions in Lp for any
p > 1, and even for functions f such that f ln(1 + |f |) ln ln(2 + |f |) is integrable (Sjolin 71). The
methods here do not seem to reach so far. I ought also to remark that if we define Af as in 286T, then
there is for every p > 1 a constant C such that kAf kp Ckf kp for every f LpC (Hunt 67, Mozzochi 71,
Jrsboe & Mejlbro 82).
Note that the point of Carlesons theorem, in either form, is that we take special limits. In the formulae
512 Fourier analysis 286 Notes

1 Rb
f (y) = lima,b a
eixy f (x)dx,
2
Pn ikx
f (x) = limn n ck e ,
valid
R ixyalmost everywhere for square-integrableP functions f , we are not taking thePordinary integral
ikx

e f (x)dx or the unconditional sum kZ c k e . If f is not integrable, or k= |ck | is infi-
nite, these will not be defined at even one point. Carlesons theorem makes sense only because we have a
natural preference for particular kinds of improper integral and conditional sum. So when we return, in
Chapter 44 of Volume 4, to Fourier analysis on general topological groups, there will simply be no language
in which to express the theorem, and while versions have been proved for other groups (e.g., Schipp 78),
they necessarily depend on some structure beyond the simple notion of locally compact Hausdorff abelian
topological group. Even in R 2 , I understand that it is still unknown whether
1 R
lima B(0,a)
eiy . x f (x)dx
2
will be defined a.e. for any square-integrable function f , if we use ordinary Euclidean balls B(0, a) in place
of the rectangles in 286Ya.
2A1A Set theory 513

Appendix to Volume 2
Useful Facts
In the course of writing this volume, I have found that a considerable number of concepts and facts from
various branches of mathematics are necessary to us. Nearly all of them are embedded in important and
well-established theories for which many excellent textbooks are available and which I very much hope that
you will one day study in depth. Nevertheless, I am reluctant to send you off immediately to courses in
general topology, functional analysis and set theory, as if these were essential prerequisites for our work
here, along with real analysis and basic linear algebra. For this reason I have written this Appendix, setting
out those results which we actually need at some point in this volume. The great majority of them really
are elementary indeed, some are so elementary that they are not always spelt out in detail in orthodox
treatments of their subjects.
While I do not put this book forward as the proper place to learn any of these topics, I have tried to set
them out in a way that you will find easy to integrate into regular approaches. I do not expect anybody
to read systematically through this work, and I hope that the references given in the main chapters of this
volume will be adequate to guide you to the particular items you need.

2A1 Set theory


Especially for the examples in Chapter 21, we need some non-trivial set theory, which is best approached
through the standard theory of cardinals and ordinals; and elsewhere in this volume I make use of Zorns
Lemma. Here I give a very brief outline of the results involved, largely omitting proofs. Most of this material
should be in any sound introduction to set theory. The references I give are to books which happen to have
come my way and which I can recommend as reasonably suitable for beginners.
I do not discuss axiom systems or logical foundations. The set theory I employ is naive in the sense
that I rely on my understanding of the collective experience of the last ninety years, rather than on any
attempt at formal description, to distinguish legitimate from unsafe arguments. There are, however, points
in Volume 5 at which such a relaxed philosophy becomes inappropriate, and I therefore use arguments which
can, I believe, be translated into standard Zermelo-Fraenkel set theory without new ideas being invoked.
Although in this volume I use the axiom of choice without scruple whenever appropriate, I will divide this
section into two parts, starting with ideas and results not dependent on the axiom of choice (2A1A-2A1I)
and continuing with the remainder (2A1J-2A1P). I believe that even at this level it helps us to understand
the nature of the arguments better if we maintain a degree of separation.

2A1A Ordered sets (a) Recall that a partially ordered set is a set P together with a relation
on P such that
if p q and q r then p r
p p for every p P
if p q and q p then p = q.
In this context, I will write p q to mean q p, and p < q or q > p to mean p q and p 6= q. is a
partial order on P .

(b) Let (P, ) be a partially ordered set, and A P . A maximal element of A is a p A such that
p 6< a for any a A. Note that A may have more than one maximal element. An upper bound for A is
a p P such that a p for every a A; a supremum or least upper bound is an upper bound p such
that p q for every upper bound q of A. There can be at most one such, because if p, p0 are both least
upper bounds then p p0 and p0 p. Accordingly we may safely write p = sup A if p is the least upper
bound of A.
Similarly, a minimal element of A is a p A such that p 6> a for every a A; a lower bound of A is a
p P such that p a for every a A; and inf A = a means that
q P , a q p q for every p A.
A subset A of P is order-bounded if it has both an upper bound and a lower bound.
514 Appendix 2A1A

A subset A of P is upwards-directed if for any p, p0 A there is a q A such that p q and p0 q; that


is, if any non-empty finite subset of A has an upper bound in A. Similarly, A P is downwards-directed
if for any p, p0 A there is a q A such that q p and q p0 ; that is, if any non-empty finite subset of A
has a lower bound in A.
It is sometimes convenient to adapt the notation for closed intervals to arbitrary partially ordered sets:
[p, q] will be {r : p r q}.

(c) A totally ordered set is a partially ordered set (P, ) such that
for any p, q P , either p q or q p.
is a total or linear order on P .

(d) A lattice is a partially ordered set (P, ) such that


for any p, q P , p q = sup{p, q} and p q = inf{p, q} are defined in P .

(e) A well-ordered set is a totally ordered set (P, ) such that inf A exists and belongs to A for every
non-empty set A P ; that is, every non-empty subset of P has a least element. In this case is a
well-ordering of P .

2A1B Transfinite Recursion: Theorem Let (P, ) be a well-ordered set and X any class. For
p SP write Lp for the set {q : q P, q < p} and X Lp for the class of all functions from Lp to X. Let
F : pP X Lp X be any function. Then there is a unique function f : P X such that f (p) = F (f Lp )
for every p P .
proof There are versions of this result in Enderton 77 (p. 175) and Halmos 60 (18). Nevertheless I
write out a proof, since it seems to me that most elementary books on set theory do not give it its proper
place at the very beginning of the theory of well-ordered sets.
(a) Let be the class of all functions such that
() dom is a subset of P , and Lp dom for every p dom ;
() (p) X for every p dom , and (p) = F (Lp ) for every p dom .
(b) If , then and agree on dom dom . P P?? If not, then A = {q : q dom dom , (q) 6=
(q)} is non-empty. Because P is well-ordered, A has a least element p say. Now Lp dom dom and
Lp A = , so
(p) = F (Lp ) = F (Lp ) = (p),
which is impossible. X
XQQ
(c) It follows that is a set, since the function 7 dom is an injective function from to PP , and its
inverse is a surjection from a subset of PP onto . We can therefore, without inhibitions, define a function
f by writing
S
dom f = dom , f (p) = (p) whenever , p dom .
S
(If you think that a function is just the set of ordered pairs {(p, (p)) : p dom }, then f becomes .)
Then f . P P Of course f is a function from a subset of P to X. If p dom f , then there is a such
that p dom , in which case
Lp dom dom f , f (p) = (p) = F (Lp ) = F (f Lp ). Q
Q

(d) f is defined everywhere in P . P


P?? Otherwise, P \ dom f is non-empty and has a least element r say.
Now Lr dom f . Define a function by saying that dom = {r} dom f , (p) = f (p) for p dom f and
(r) = F (f Lr ). Then , because if p dom
either p dom f so Lp dom f dom and
(p) = f (p) = F (f Lp ) = F (Lp )
or p = r so Lp = Lr dom f dom and
(p) = F (f Lr ) = F (Lr ).
2A1Fc Set theory 515

Accordingly and r dom dom f . X


XQQ
(e) Thus f : P X is a function such that f (p) = F (f Lp ) for every p. To see that f is unique, observe
that any function of this type must belong to , so must agree with f on their common domain, which is
the whole of P .
Remark If you have been taught to distinguish between the words set and class, you will observe that
my naive set theory is a relatively tolerant one in that it is willing to allow class variables in its theorems.

2A1C Ordinals An ordinal (sometimes called a von Neumann ordinal) is a set such that
if then is a set and 6 ,
if then ,
writing to mean or = , (, ) is well-ordered
(Enderton 77, p. 191; Halmos 60, 19; Henle 86, p. 27; Krivine 71, p. 24; Roitman 90, 3.2.8. Of
course many set theories do not allow sets to belong to themselves, and/or take it for granted that every
object of discussion is a set, but I prefer not to take a view on such points in general.)

2A1D Basic facts about ordinals (a) If is an ordinal, then every member of is an ordinal.
(Enderton 77, p. 192; Henle 86, 6.4; Krivine 71, p. 14; Roitman 90, 3.2.10.)

(b) If , are ordinals then either or = or (and no two of these can occur together).
(Enderton 77, p. 192; Henle 86, 6.4; Krivine 71, p. 14; Lipschutz 64, 11.12; Roitman 90, 3.2.13.)
It is customary, in this case, to write < if and if either or = . Note that iff
.

(c) If A is any non-empty class of ordinals, then there is an A such that for every A.
(Henle 86, 6.7; Krivine 71, p. 15.)

(d) If is an ordinal, so is {}; call it + 1. If < then + 1 ; + 1 is the least ordinal greater
than . (Enderton 77, p. 193; Henle 86, 6.3; Krivine 71, p. 15.) For any ordinal , either S there is a
greatest ordinal < , in which case = + 1 and we call a successor ordinal, or = , in which
case we call a limit ordinal.

(e) The first few ordinals are 0 = , 1 = 0 + 1 = {0} = {}, 2 = 1 + 1 = {0, 1} = {, {}}, 3 = 2 + 1 =
{0, 1, 2}, . . . . The first infinite ordinal is = {0, 1, 2, . . . }, which may be identified with N.

(f ) The union of any set of ordinals is an ordinal. (Enderton 77, p. 193; Henle 86, 6.8; Krivine 71,
p. 15; Roitman 90, 3.2.19.)

(g) If (P, ) is any well-ordered set, there is a unique ordinal such that P is order-isomorphic to , and
the order-isomorphism is unique. (Enderton 77, pp. 187-189; Henle 86, 6.13; Halmos 60, 20.)

2A1E Initial ordinals An initial ordinal is an ordinal such that there is no bijection between
and any member of . (Enderton 77, p. 197; Halmos 60, 25; Henle 86, p. 34; Krivine 71, p. 24;
Roitman 90, 5.1.10, p. 79).

2A1F Basic facts about initial ordinals (a) All finite ordinals, and the first infinite ordinal , are
initial ordinals.

(b) For every well-ordered set P there is a unique initial ordinal such that there is a bijection between
P and .

(c) For every ordinal there is a least initial ordinal greater than . (Enderton 77, p. 195; Henle 86,
7.2.1.) If is an initial ordinal, write + for the least initial ordinal greater than . We write 1 for + , 2
for 1+ , and so on.
516 Appendix 2A1Fd

(d) For any initial ordinal there is a bijection between and ; consequently there are bijections
between and r for every r 1.

2A1G Schroder-Bernstein theorem I remind you of the following fundamental result: if X and Y
are sets and there are injections f : X Y , g : Y X then there is a bijection h : X Y . (Enderton
77, p. 147; Halmos 60, 22; Henle 86, 7.4; Lipschutz 64, p. 145; Roitman 90, 5.1.2. It is also a special
case of 344D in Volume 3.)

2A1H Countable subsets of PN The following results will be needed below.


(a) There is a bijection between PN and R. (Enderton 77, p. 149; Lipschutz 64, p. 146.)

(b) Suppose that X is any set such that there is an injection from X into PN. Let C be the set of
countable subsets of X. Then there is a surjection from PN onto C. P P Let f : X PN be an injection.
Set f1 (x) = {0} {i + 1 : i f (x)}; then f1 : X PN is injective and f1 (x) 6= for every x X. Define
g : PN PX by setting
g(A) = {x : n N, f1 (x) = {i : 2n (2i + 1) A}}
for each A N. Then g(A) is countable, since we have an injection
x 7 min{n : f1 (x) = {i : 2n (2i + 1) A}}
from g(A) to N. Thus g is a function from PN to C. To see that g is surjective, observe that = g(), while
if C X is countable and not empty there is a surjection h : N C; now set
A = {2n (2i + 1) : n N, i f1 (h(n))},
and see that g(A) = C. Q
Q

(c) Again suppose that X is a set such that there is an injection from X to PN, and write H for the
set of functions h such that dom h is a countable subset of X and h takes values in {0, 1}. Then there is
a surjection from PN onto H. P P Let C be the set of countable subsets of X and let g : PN C be a
surjection, as in (a). For A N set
g0 (A) = g({i : 2i A}), g1 (A) = g({i : 2i + 1 A}),
so that g0 (A), g1 (A) are countable subsets of X, and A 7 (g0 (A), g1 (A)) is a surjection from PN onto C C.
Let hA be the function with domain g0 (A) g1 (A) such that hA (x) = 1 if x g1 (A), 0 if x g0 (A) \ g1 (A).
Then A 7 hA is a surjection from PN onto H. Q Q

2A1I Filters I pause for a moment to discuss a construction which is of great value in investigating
topological spaces, but has other uses, and in its nature belongs to elementary set theory (much more
elementary, indeed, than the work above).

(a) Let X be a non-empty set. A filter on X is a family F of subsets of X such that


X F,
/ F,
E F F whenever E, F F,
E F whenever X E F F.
The second condition implies (inducing on n) that F0 . . . Fn F whenever F0 , . . . , Fn F.

(b) Let X, Y be non-empty sets, F a filter on X and f : D Y a function, where D F . Then


{E : E Y, f 1 [E] F}
is a filter on Y (because f 1 [Y ] = D, f 1 [] = , f 1 [E F ] = f 1 [E] f 1 [F ], X f 1 [E] f 1 [F ]
whenever Y E F ); I will call it f [[F]], the image filter of F under f .
Remark Of course there is a hidden variable in this notation. Ordinarily in this book I regard a function
f as being defined by its domain dom f and its values on its domain; that is, it is determined by its graph
{(x, f (x)) : x dom f }, and indeed I normally do not distinguish between a function and its graph. This
2A1Le Set theory 517

means that when I write f : D Y is a function then the class D = dom f can be recovered from the
function, but the class Y cannot; all I promise is that Y includes the class f [D] of values of f . Now in the
notation f [[F]] above we do actually need to know which set Y it is to be a filter on, even though this cannot
be discovered from knowledge of f and F. So you will always have to infer it from the context.

2A1J The Axiom of Choice I come now to the second half of this section, in which I discuss concepts
and theorems dependent on the Axiom of Choice. Let me remind you of the statement of this axiom:
(AC) whenever I is a set and hXi iiI is a family of non-empty sets indexed by I, there is a function
f , with domain I, such that f (i) Xi for every i I.
The function f is a choice function; it picks out one member of each of the given family of non-empty sets
Xi .
I believe that ones attitude to this principle is a matter for individual choice. It is an indispensable
foundation for very large parts of twentieth-century pure mathematics, including a substantial fraction of
the present volume; but there are also significant areas in which principles actually contradictory to it can
be employed to striking effect, leading in my view to equally valid mathematics. At present it is the case
that more current mathematical activity, by volume, depends on asserting the axiom of choice than on all its
rivals put together; but it is a matter of judgement and taste where the most important, or exciting, ideas
are to be found. For the present volume I follow standard practice in twentieth-century abstract analysis,
using the axiom of choice whenever necessary; but in Volume 5 I hope to look at alternatives.

2A1K Zermelos Well-Ordering Theorem (a) The Axiom of Choice is equiveridical with each of
the statements
for every set X there is a well-ordering of X,
for every set X there is a bijection between X and some ordinal,
for every set X there is a unique initial ordinal such that there is a bijection between X
and .
(Enderton 77, p. 196 et seq.; Halmos 60, 17; Henle 86, 9.1-9.3; Krivine 71, p. 20; Lipschutz 64,
12.1; Roitman 90, 3.6.38.)

(b) When assuming the axiom of choice, as I do nearly everywhere in this treatise, I write #(X) for that
initial ordinal such that there is a bijection between and X; I call this the cardinal of X.

2A1L Fundamental consequences of the Axiom of Choice (a) For any two sets X and Y , there
is a bijection between X and Y iff #(X) = #(Y ). More generally, there is an injection from X to Y iff
#(X) #(Y ), and a surjection from X onto Y iff either #(X) #(Y ) > 0 or #(X) = #(Y ) = 0.

(b) In particular, #(PN) = #(R); write c for this common value, the cardinal of the continuum.
Cantors theorem that PN and R are uncountable becomes the result < c, that is, 1 c.

(c) If X is any infinite set, and r 1, then there is a bijection between X r and X. (Enderton 77, p.
162; Halmos 60, 24.) (I note that we need some form of the axiom of choice to prove the result in this
generality. But of course for most of the infinite sets arising naturally in mathematics sets like N and PR
it is easy to prove the result without appeal to the axiom of choice.)

(d) Suppose that is an infinite cardinal. IfSI is a set of cardinal at most Sand hAi iiI is a family of
sets with #(Ai ) for every i I, then #( iI Ai ) . Consequently #( A) whenever A is a
family of sets such that #(A) and #(A) for every A A. In particular, 1 cannot be expressed as
a countable union of countable sets, and 2 cannot be expressed as a countable union of sets of cardinal at
most 1 .

(e) Now we can rephrase 2A1Hc as: if #(X) c, then #(H) c, where H is the set of functions from
a countable subset of X to {0, 1}. P
P For we have an injection from X into PN, and therefore a surjection
from PN onto H. Q Q
518 Appendix 2A1Lf

(f ) Any non-empty class of cardinals has a least member (by 2A1Dc).

2A1M Zorns Lemma In 2A1K I described the well-ordering principle. I come now to another propo-
sition which is equiveridical with the axiom of choice:
Let (P, ) be a non-empty partially ordered set such that every non-empty totally ordered
subset of P has an upper bound in P . Then P has a maximal element.
This is Zorns Lemma. For the proof that the axiom of choice implies, and is implied by, Zorns Lemma,
see Enderton 77, p. 151; Halmos 60, 16; Henle 86, 9.1-9.3; Roitman 90, 3.6.38.

2A1N Ultrafilters A filter F on a set X is an ultrafilter if for every A X either A F or X \A F.


If F is an ultrafilter on X and f : D Y is a function, where D F, then f [[F]] is an ultrafilter on Y
(because f 1 [Y \ A] = D \ f 1 [A] for every A Y ).
One type of ultrafilter can be described easily: if x is any point of a set X, then F = {F : x F X}
is an ultrafilter on X. (You need only read the definitions. Ultrafilters of this type are called principal
ultrafilters.) But it is not obvious that there are any further ultrafilters, and indeed it is not possible to
prove that there are any, without using a strong form of the axiom of choice, as follows.

2A1O The Ultrafilter Theorem As an example of the use of Zorns lemma which will be of great
value in studying compact topological spaces (2A3N et seq., and 247), I give the following result.
Theorem Let X be any non-empty set, and F a filter on X. Then there is an ultrafilter H on X such that
F H.
proof (Cf. Henle 86, 9.4; Roitman 90, 3.6.37.) Let P be the set of all filters on X including F, and
order P by inclusion, so that, for G1 , G2 P, G1 G2 in P iff G1 G2 . It is easy to see that P is a partially
orderedSset, and it is non-empty because F P. If Q is any non-empty totally ordered subset of P, then
HQ = Q P. P P Of course HQ is a family of subsets of X. (i) Take any G0 Q; then X G0 HQ .
If G Q, then G is a filter, so
/ G; accordingly / HQ . (ii) If E, F HQ , then there are G1 , G2 Q
such that E G1 and F G2 . Because Q is totally ordered, either G1 G2 or G2 G1 . In either case,
G = G1 G2 Q. Now G is a filter containing both E and F , so it contains E F , and E F HQ . (iii) If
X E F HQ , there is a G Q such that F G; and E G HQ . This shows that HQ is a filter on
X. (iv) Finally, HQ G0 F, so HQ P. Q Q Now HQ is evidently an upper bound for Q in P.
We may therefore apply Zorns Lemma to find a maximal element H of P. This H is surely a filter on X
including F.
Now let A X be such that A / H. Consider
H1 = {E : E X, E A H}.
This is a filter on X. PP Of course it is a family of subsets of X. (i) X A = X H, so X H1 .
A=A / H so / H1 . (ii) If E, F H1 then
(E F ) A = (E A) (F A) H,
so E F H1 . (iii) If X E F H1 then E A F A H, so E A H and E H1 . Q Q Also
H1 H, so H1 P. But H is a maximal element of P, so H1 = H. Since (X \ A) A = X H, X \ A H1
and X \ A H.
As A is arbitrary, H is an ultrafilter, as required.

2A1P I come now to a result from infinitary combinatorics for which I give a detailed proof, not because
it cannot be found in many textbooks, but because it is usually given in enormously greater generality, to
the point indeed that it may be harder to understand why the stated theorem covers the present result than
to prove the latter from first principles.
Theorem (a) Let hK iA be a family of countable sets, with #(A) strictly greater than c, the cardinal of
the continuum. Then there are a set M , of cardinal at most c, and a set B A, of cardinal strictly greater
than c, such that K K M whenever , are distinct members of B.
2A2 intro. The topology of Euclidean space 519

(b) Let I be a set, and hf iA a family in {0, 1}I , the set of functions from I to {0, 1}, with #(A) > c.
If hK iA is any family of countable subsets of I, then there is a set B A, of cardinal greater than c,
such that f and f agree on K K for all , B.
(c) In particular, under the conditions of (b), there are distinct , A such that f and f agree on
K K .
proof (a) Choose inductively a family hM i<1 of sets by the rule
if there is any set N such that
S
() N is disjoint from < M , #(N ) c and
#({ : A, K N = }) c,
choose such a set for M ;
otherwise set M = .
S
When M has been chosen for every < 1 , set M = <1 M . The rule ensures that hM i<1 is disjoint
and that #(M ) c for every < 1 , while 1 c, so #(M ) c.
Let P be the family of sets P A such that K K M for all distinct S , P . Order P by
inclusion, so that it is
S a partially ordered set. If Q P is totally ordered, then Q P. P P If , are
distinct members of Q, there are Q1 , Q2 Q such that Q1 , Q2 ; now P = Q1 Q2 is equal to
one of Q1 , Q2 , and in either case belongs to P and contains both and , so K K M . Q Q By Zorns
Lemma, P has a maximal element B, and we surely S have K K M for all distinct , B.
?? Suppose, if possible, that #(B) c. Set N = B K \ M . Then N has cardinal at most c, being
included in a union of at most c countable sets. For every A \ B, B {} / P, so there must be some
B such that K K 6 M ; that is, K N 6= . Thus { : K N = } B has cardinal at most
c. But this means that in the rule for choosing M , there was always an N satisfying the condition (),
and therefore
S M also did. Thus C = { : K M = } has cardinal at most c for every < 1 . So
C = <1 C also has. But the original hypothesis was that #(A) > c, so there is an A \ C. In this
case, K M 6= for every < 1 . But this means that we have a surjection : K M 1 given by
setting
(i) = if i K M .
Since #(K ) < 1 , this is impossible. X
X
Accordingly #(B) > c and we have found a suitable pair M , B.
(b) By (a), we can find a set M , of cardinal at most c, and a set B0 A, of cardinal greater than c, such
that K K M for all distinct , B0 . Let H be the set of functions
S from countable subsets of M to
{0, 1}; then f0 = f (K M ) H for each B0 . Now B0 = hH { : B0 , f0 = h} has cardinal
greater than c, while #(H) c (2A1Le), so there must be some h H such that B = { : B0 , f0 = h}
has cardinal greater than c.
If , are distinct members of B, then K K M , because , B0 ; but this means that
f K K = hK K = f K K .
Thus B has the required property. (Of course f and f agree on K K if = .)
(c) follows at once.
Remark The result we need in this volume (in 216E) is part (c) above. There are other proofs of this,
perhaps a little simpler; but the stronger result in part (b) will be useful in Volume 3.

2A2 The topology of Euclidean space


In the appendix to Volume 1 (1A2) I discussed open and closed sets in R r ; the chief aim there was to
support the idea of Borel set, which is vital in the theory of Lebesgue measure, but of course they are
also fundamental to the study of continuous functions, and indeed to all aspects of real analysis. I give
here a very brief introduction to the further elementary facts about closed and compact sets and continuous
functions which we need for this volume. Much of this material can be derived from the generalizations in
520 Appendix 2A2 intro.

2A3, but nevertheless I sketch the proofs, since for the greater part of the volume (most of the exceptions
are in Chapter 24) Euclidean space is sufficient for our needs.

2A2A Closures: Definition For any r 1, any A R r , the closure of A, A, is the intersection of
all the closed subsets of R r including A. This is itself closed (being the intersection of a non-empty family
of closed sets, see 1A2Fd), so is the smallest closed set including A. In particular, A is closed iff A = A.

2A2B Lemma Let A R r be any set. Then for x R r , the following are equiveridical:
(i) x A, the closure of A;
(ii) for every > 0, B(x, ) A 6= , where B(x, ) = {y : ky xk };
(iii) there is a sequence hxn inN in A such that limn kxn xk = 0.
proof (a)(i)(ii) Suppose that x A and > 0. Then U (x, ) = {y : ky xk < } is an open set (1A2D),
so F = R r \ U (x, ) is closed, while x
/ F . Now
x A \ F = A 6 F = A 6 F = A U (x, ) 6= = A B(x, ) 6= .
As is arbitrary, (ii) is true.
(b)(ii)(iii) If (ii) is true, then for each n N we can find an xn A such that kxn xk 2n , and
now limn kxn xk = 0.
/ A. Then x belongs to the open set R r \ A
(c)(iii)(i) Assume (iii). ?? Suppose, if possible, that x
r
and there is a > 0 such that U (x, ) R \ A. But now there is an n such that kxn xk < , in which
case xn U (x, ) A U (x, ) A. X
X

2A2C Continuous functions (a) I begin with a characterization of continuous functions in terms of
open sets. If r, s 1, D R r and : D R s is a function, we say that is continuous if for every
x D, > 0 there is a > 0 such that k(y) (x)k whenever y D and ky xk . Now is
continuous iff for every open set G R s there is an open set H R r such that 1 [G] = D H.
PP (i) Suppose that is continuous and that G R s is open. Set
S
H = {U : U R r is open, [U D] G}.
Then H is a union of open sets, therefore open (1A2Bd), and H D 1 [G]. If x 1 [G], then (x) G,
so there is an > 0 such that U ((x), ) G; now there is a > 0 such that k(y) (x)k 21 whenever
y D, ky xk , so that
[U (x, ) D] U ((x), ) G
and
x U (x, ) H.
1
As x is arbitrary, [G] = H D. As G is arbitrary, satisfies the condition.
(ii) Now suppose that satisfies the condition. Take x D, > 0. Then U ((x), ) is open, so there
is an open H R r such that H D = 1 [U ((x), )]; we see that x H, so there is a > 0 such that
U (x, ) H; now if y D and ky xk 21 then y D H, (y) U ((x), ) and k(y) (x)k . As
x and are arbitrary, is continuous. QQ

(b) Using the - definition of continuity, it is easy to see that a function from a subset D of R r to R s
is continuous iff all its components i are continuous, writing (x) = (1 (x), . . . , s (x)) for x D. P
P (i) If
is continuous, i s, x D and > 0, then there is a > 0 such that
|i (y) i (x)| k(y) (x)k
whenever y D and ky xk . (ii) If every i is continuous, x D and > 0, then there are i > 0
such that |i (y) i (x)| / s whenever y D and ky xk i ; setting = minim i > 0, we have
k(y) (x)k whenever y D and ky xk . Q Q

2A2D Compactness in R r : Definition A subset F of R r is called compact if whenever G is a family


of open sets covering F then there is a finite subset G0 of G still covering F .
2A2F The topology of Euclidean space 521

2A2E Elementary properties of compact sets Take any r 1.

(a) If F R r is compact and E R r is closed, then F E is compact. P P Let G be an open cover of


E F . Then G {R r \ E} is an open cover of F , so has a finite subcover G0 say. Now G0 \ {R r \ E} is a
finite subset of G covering F E. As G is arbitrary, F E is compact. Q
Q

(b) If F is compact and : D R s is a continuous function with F D, then [F ] is compact. P


P Let
G be an open cover of [F ]. Let H be
{H : H R r is open, G G, 1 [G] = D H}.
If x F , then (x) [F ] so there is a G G suchSthat (x) G; now there is an H H such that
x 1 [G] = D H (2A2Ca); as x is arbitrary, F H. Let H0 be a finite subset of H covering F . For
each H H0 , let GH G be such that 1 [GH ] = D H; then {GH : H H0 } is a finite subset of G
covering [F ]. As G is arbitrary, [F ] is compact. Q
Q

(c) Any compact subset of R r is closed. P


P Write G = R r \ F . Take any x G. Then Gn = R r \ B(x, 2n )
is open for every n N (1A2G). Also
S r r
nN Gn = {y : y R , ky xk > 0} = R \ {x} F .

So there is some finite set G0 {Gn : n N} which covers F . There must be an n such that G0 {Gi : i n},
so that
S S
F G0 in Gi = Gn ,
and B(x, 2n ) G. As x is arbitrary, G is open and F is closed. Q
Q

(d) If F is compact and G is open and F G, then there is a > 0 such that F + B(0, ) G. P
P If
F = , this is trivial, as then
F + B(0, 1) = {x + y : x F, y B(0, 1)} = .
Otherwise, set
G = {U (x, ) : x Rr , > 0, U (x, 2) G}.
S
Then G is a family of open sets and G = G (because G is open), so G is an open cover of F and
has a finite subcover G0 . Express G0 as {U (x0 , 0 ), . . . , U (xn , n )} where U (xi , 2i ) G for each i. Set
= minin i > 0. If x F and y B(0, ), then there is an i n such that x U (xi , i ); now
k(x + y) xi k kx xi k + kyk < i + 2i ,
so x + y U (xi , 2i ) G. As x and y are arbitrary, F + B(0, ) G. Q
Q
Remark This result is a simple form of the Lebesgue covering lemma.

2A2F The value of the concept of compactness is greatly increased by the fact that there is an effective
characterization of the compact subsets of R r .
Theorem For any r 1, a subset F of R r is compact iff it is closed and bounded.
proof (a) Suppose that F is compact. By 2A2Ec, it isSclosed. To see that it is bounded, consider G =
{U (0, n) : n N}. G consists entirely of open sets, and G = Rr F , so there is a finite G0 G covering
F . There must be an n such that G0 {Gi : i n}, so that
S S
F G0 in U (0, i) = U (0, n),
and F is bounded.
(b) Thus we are left with the converse; I have to show that a closed bounded set is compact. The main
part of the argument is a proof by induction on r that the closed interval [n, n] is compact for all n N,
writing n = (n, . . . , n) Rr .
(i) If r = 1 and n N and G is a family of open sets in R covering [n, n], set
522 Appendix 2A2F

S
A = {x : x [n, n], there is a finite G0 G such that [n, x] G0 }.
S
Then n A, because if n G G then [n, n] {G}, and A is bounded above by n, so c = sup A
exists and belongs to [n,
S n].
Next, c [n, n] G, so there is a G G containing c. Let > 0 be such that U (c, ) G. There
is an x A such that x c . Let G0 be a finite subset of G covering [n, x]. Then G1 = G0 {G} is a
finite subset of G covering [n, c + 21 ]. But c + 12
/ A so c + 12 > n and G1 is a finite subset of G covering
[n, n]. As G is arbitrary, [n, n] is compact and the induction starts.
(ii) For the inductive step to r + 1, regard the closed interval F = [n, n], taken in R r+1 , as the
product of the closed interval E = [n, n], taken in R r , with the closed interval [n, n] R; by the
inductive hypothesis, both E and [n, n] are compact. Let G be a family of open subsets of R r+1 covering
r
F . Write H for the
S family of open subsets H of R such that H [n, n] is covered by a finite subfamily
of G. Then E H. P P Take x E. Set
Ux = {U : U R is open, G G, open H R r , x H and H U G}.
Then Ux is a family of open subsets of R. If [n, n], there is a G G containing (x, ); there is a > 0
such that U ((x, ), ) G; now U (x, 21 ) and U (, 12 ) are open sets in R r , R respectively and
U (x, 21 ) U (, 12 ) U ((x, ), ) G,
so U (, 21 ) Ux . As is arbitrary, Ux is an open cover of [n, n] in R. By (i), it has a finite subcover
U0 , . . . , Uk say. For each j k we can findTHj , Gj such that Hj is an open subset of R r containing
x and Hj Uj Gj G. Now set H = jk Hj . This is an open subset of R r containing x, and
S
H [n, n] jn Gj is covered by a finite subfamily of G. So x H H. As x is arbitrary, H covers E.
Q
Q
(iii) Now the inductive hypothesis tells us that E is compact, so there is a finite subfamily
S H0 of H
covering E. For each H H0 let GH be a finite subfamily of G covering H [n, n]. Then HH0 GH is a
finite subfamily of G covering E [n, n] = F . As G is arbitrary, F is compact and the induction proceeds.
(iv) Thus the interval [n, n] is compact in R r for every r, n. Now suppose that F is a closed bounded
set in R r . Then there is an n N such that F [n, n], that is, F = F [n, n]. As F is closed and
[n, n] is compact, F is compact, by 2A2Ea.
This completes the proof.

2A2G Corollary If : D R is continuous, where D R r , and F D is a non-empty compact set,


then is bounded and attains its bounds on F .
proof By 2A2Cb, [F ] is compact; by 2A2D it is closed and bounded. To say that [F ] is bounded is
just to say that is bounded on F . Because [F ] is a non-empty bounded set, it has an infimum a and a
supremum b; now both belong to [F ] (by the criterion 2A2B(ii), or otherwise); because [F ] is closed, both
belong to [F ], that is, attains its bounds.

2A2H Lim sup and lim inf revisited In 1A3 I briefly discussed lim supn an , lim inf n an for
real sequences han inN . In this volume we need the notion of lim sup0 f (), lim inf 0 f () for real functions
f . I say that lim sup0 f () = u [, ] if (i) for every v > u there is a > 0 such that f () is defined
and less than or equal to v for every ]0, ] (ii) for every v < u, > 0 there is a ]0, ] such that f () is
defined and greater than or equal to v. Similarly, lim inf 0 f () = u [, ] if (i) for every v < u there
is a > 0 such that f () is defined and greater than or equal to v for every ]0, ] (ii) for every v > u,
> 0 there is an ]0, ] such that f () is defined and less than or equal to v.

2A2I In the one-dimensional case, we have a particularly simple description of the open sets.
Proposition If G R is any open set, it is expressible as the union of a countable disjoint family of open
intervals.
proof For x, y G write x y if either x y and [x, y] G or y x and [y, x] G. It is easy to
check that is an equivalence relation on G. Let C be the set of equivalence classes under . Then C is a
2A3Bd General topology 523

partition of G. Now every C C is an open interval. PP Set a = inf C, b = sup C (allowing a = and/or
b = if C is unbounded). If a < x < b, there are y, z C such that y x z, so that [y, x] [y, z] G
and y x and x C; thus ]a, b[ C. If x C, there is an open interval I containing x and included in G;
since x y for every y I, I C; so
a inf I < x < sup I b
and x ]a, b[. Thus C = ]a, b[ is an open interval. Q
Q
To see that C is countable, observe that every member of C contains a member of Q, so that we have a
surjective function from a subset of Q onto C, and C is countable (1A1E).

2A3 General topology


At various points principally 245-247, but also for certain ideas in Chapter 27 we need to know
something about non-metrizable topologies. I must say that you should probably take the time to look at
some book on elementary functional analysis which has the phrases weak compactness or weakly compact
in the index. But I can list here the concepts actually used in this volume, in a good deal less space than
any orthodox, complete treatment would employ.

2A3A Topologies First we need to know what a topology is. If X is any set, a topology onSX is a
family T of subsets of X such that (i) , X T (ii) if G, H T then G H T (iii) if G T then G T
(cf. 1A2B). The pair (X, T) is now a topological space. In this context, members of T are called open
and their complements (in X) are called closed (cf. 1A2E-1A2F).

2A3B Continuous functions (a) If (X, T) and (Y, S) are topological spaces, a function : X Y
is continuous if 1 [G] T for every G S. (By 2A2Ca above, this is consistent with the - definition
of continuity for functions from one Euclidean space to another.)
(b) If (X, T), (Y, S) and (Z, U) are topological spaces and : X Y and : Y Z are continuous,
then : X Z is continuous. P P If G U then 1 [G] S so ()1 [G] = 1 [ 1 [G]] T. Q
Q
(c) If (X, T) is a topological space, a function f : X R is continuous iff {x : a < f (x) < b} is open
whenever a < b in R. P P (i) Every interval ]a, b[ is open in R, so if f is continuous its inverse image
{x : a < f (x) < b} must be open. (ii) Suppose that f 1 [ ]a, b[ ] is open whenever a < b, and let H R be
any open set. By the definition of open set in R (1A2A),
[
H = {]y , y + [ : y R, > 0, ]y , y + [ H},
so
S 1
f 1 [H] = {f [ ]y , y + [ ] : y R, > 0, ]y , y + [ H}
is a union of open sets in X, therefore open. Q
Q
(d) If r 1, (X, T) is a topological space, and : X R r is a function, then is continuous iff
i : X R is continuous for each i r, where (x) = (1 (x), . . . , r (x)) for each x X. P P (i) Suppose
that is continuous. For i r, y = (1 , . . . , r ) R r , set i (y) = i . Then |i (y) i (z)| ky zk for all
y, z R r so i : R r R is continuous. Consequently i = i is continuous, by 2A3Bb. (ii) Suppose that
every i is continuous, and that H R r is open. Set
G = {G : G X is open, G 1 [H]}.
S
Then G0 = G is open, and G0 1 [H]. But suppose that x0 is any point of 1 [H]. Then there is
a > 0 such that U ((x0 ), ) H, because H is open and contains (x0 ). For 1 i r set Vi = {x :
i (x0 ) r < i (x) < i (x0 ) + r }; then Vi is the inverse image of an open set under the continuous
T
map i , so is open. Set G = ir Vi . Then G is open (using (ii) of the definition 2A3A), x0 G, and
k(x) (x0 k < for every x G, so G 1 [H], G G and x0 G0 . This shows that 1 [H] = G0 is
open. As H is arbitrary, is continuous. Q Q
524 Appendix 2A3Be

(e) If (X, T) is a topological space, f1 , . . . , fr are continuous functions from X to R, and h : R r R is


continuous, then h(f1 , . . . , fr ) : X R is continuous. P P Set (x) = (f1 (x), . . . , fr (x)) R r for x X. By
(d), is continuous, so by 2A3Bb h(f1 , . . . , fr ) = h is continuous. Q
Q In particular, f + g, f g and f g
are continuous for all continuous functions f , g : X R.

(f ) If (X, T) and (Y, S) are topological spaces and : X Y is a continuous function, then 1 [F ] is
closed in X for every closed set F Y . (For X \ 1 [F ] = 1 [Y \ F ] is open.)

2A3C Subspace topologies If (X, T) is a topological space and D X, then TD = {G D : G T}


is a topology on D. P P (i) = D and D = X D belong to TD . (ii) If G, H TD there are G0 ,
H 0 T such that G = G0 D, H S= H 0 SD; now G H = G0 H 0 D TD . (iii) If G TD set
H = {H : H T, H D G}; then G = ( H) D TD . Q Q
TD is called the subspace topology on D, or the topology on D induced by T. If (Y, S) is another
topological space, and : X Y is (T, S)-continuous, then D : D Y is (TD , S)-continuous. (For if
H S then
(D)1 [H] = D 1 [H] TD .)

2A3D Closures and interiors (a) In the proof of 2A3Bd I have already used the following idea. Let
(X, T) be any topological space and A any subset of X. Write
S
int A = {G : G T, G A}.
Then int A is an open set, being a union of open sets, and is of course included in A; it must be the largest
open set included in A, and is called the interior of A.

(b) Because a set is closed iff its complement is open, we have a complementary notion:
\
A = {F : F is closed, A F }
[
= X \ {X \ F : F is closed, A F }
[
= X \ {G : G is open, A G = }
[
= X \ {G : G is open, G X \ A} = X \ int(X \ A).

A is closed (being the complement of an open set) and is the smallest closed set including A; it is called the
closure of A. (Compare 2A2A.) Because the union of two closed sets is closed (cf. 1A2F), A B = A B
for all A, B X.

(c) There are innumerable ways of looking at these concepts; a useful description of the closure of a set is

x A x
/ int(X \ A)
there is no open set containing x and included in X \ A
every open set containing x meets A.

2A3E Hausdorff topologies (a) The concept of topological space is so widely drawn, and so widely
applicable, that a vast number of different types of topological space have been studied. For this volume we
shall not need much of the (very extensive) vocabulary which has been developed to describe this variety.
But one useful word (and one of the most important concepts) is that of Hausdorff space; a topological
space X is Hausdorff if for all distinct x, y X there are disjoint open sets G, H X such that x G
and y H.

(b) In a Hausdorff space X, finite sets are closed. P


P If z X, then for any x X \ {z} there is an open
set containing x but not z, so X \ {z} is open and {z} is closed. So a finite set is a finite union of closed
sets and is therefore closed. Q
Q
2A3G General topology 525

2A3F Pseudometrics Many important topologies (not all!) can be defined by families of pseudometrics;
it will be useful to have a certain amount of technical skill with these.
(a) Let X be a set. A pseudometric on X is a function : X X [0, [ such that
(x, z) (x, y) + (y, z) for all x, y, z X
(the triangle inequality;)
(x, y) = (y, x) for all x, y X;
(x, x) = 0 for all x X.
A metric is a pseudometric satisfying the further condition
if (x, y) = 0 then x = y.

(b) Examples (i) For x, y R, set (x, y) = |x y|; then is a metric on R (the usual metric on R).
pPr
(ii) For x, y R r , where r 1, set (x, y) = kx yk, defining kzk = 2
i=1 i , as usual. Then is
r
a metric, the Euclidean metric on R . (The triangle inequality for comes from Cauchys inequality in
1A2C: if x, y, z R r , then
(x, z) = kx zk = k(x y) + (y z)k kx yk + ky zk = (x, y) + (y, z).
The other required properties of are elementary. Compare 2A4Bb below.)
(iii) For an example of a pseudometric which is not a metric, take r 2 and define : R r Rr [0, [
by setting (x, y) = |1 1 | whenever x = (1 , . . . , r ), y = (1 , . . . , r ) R r .

(c) Now let X be a set and P a non-empty family of pseudometrics on X. Let T be the family of those
subsets G of X such that for every x G there are 0 , . . . , n P and > 0 such that
U (x; 0 , . . . , n ; ) = {y : y X, maxin i (y, x) < } G.
Then T is a topology on X.
P
P (Compare 1A2B.) (i) T because the condition is vacuously satisfied. X T because U (x; ; 1) X
for any x X, P. (ii) If G, H T and x G H, take 0 , . . . , m , 00 , . . . , 0n P, , 0 > 0 such that
U (x; 0 , . . . , m ; ) G, U (x; 00 , . . . , 0n ; 0 ) G; then
U (x; 0 , . . . , m , 00 , . . . , 0n ; min(, 0 )) G H.
S
As x is arbitrary, G H T. (iii) If G T and x G, there is a G G such that x G; now there are
0 , . . . , n P and > 0 such that
S
U (x; 0 , . . . , n ; ) G G.
S
As x is arbitrary, G T. Q Q
T is the topology defined by P.

(d) You may wish to have a convention to deal with the case in which P is the empty set; in this case the
topology on X defined by P is {, X}.

(e) In many important cases, P is upwards-directed in the sense that for any 1 , 2 P there is a P
such that i (x, y) (x, y) for all x, y X and both i. In this case, of course, any set U (x; 0 , . . . , n ; ),
where 0 , . . . , n P, includes some set of the form U (x; ; ), where P. Consequently, for instance, a
set G X is open iff for every x G there are P, > 0 such that U (x; ; ) G.

(f ) A topology T is metrizable if it is the topology defined by a family P consisting of a single metric.


Thus the Euclidean topology on R r is the metrizable topology defined by {}, where is the metric of
(b-ii) above.

2A3G Proposition Let X be a set with a topology defined by a non-empty set P of pseudometrics on
X. Then U (x; 0 , . . . , n ; ) is open for all x X, 0 , . . . , n P and > 0.
proof (Compare 1A2D.) Take y U (x; 0 , . . . , n ; ). Set
= maxin i (y, x), = > 0.
526 Appendix 2A3G

If z U (y; 0 , . . . , n ; ) then
i (z, x) i (z, y) + i (y, x) < + =
for each i n, so U (y; 0 , . . . , n ; ) U (x; 0 , . . . , n ; ). As y is arbitrary, U (x; 0 , . . . , n ; ) is open.

2A3H Now we have a result corresponding to 2A2Ca, describing continuous functions between topo-
logical spaces defined by families of pseudometrics.
Proposition Let X and Y be sets; let P be a non-empty family of pseudometrics on X, and a non-empty
family of pseudometrics on Y ; let T and S be the corresponding topologies. Then a function : X Y
is continuous iff whenever x X, and > 0, there are 0 , . . . , n P and > 0 such that
((y), (x)) whenever y X and maxin i (y, x) .
proof (a) Suppose that is continuous; take x X, and > 0. By 2A3G, U ((x); ; ) S. So G =
1 [U ((x); ; )] T. Now x G, so there are 0 , . . . , n P and > 0 such that U (x; 0 , . . . , n ; ) G.
In this case ((y), (x)) whenever y X and maxin i (y, x) 21 . As x, and are arbitrary,
satisfies the condition.
(b) Suppose satisfies the condition. Take H S and consider G = 1 [H]. If x G, then (x) H,
so there are 0 , . . . , n and > 0 such that U ((x); 0 , . . . , n ; ) H. For each i n there are
i1 , . . . , i,mi P and i > 0 such that ((y), (x)) 12 whenever y X and maxjmi ij (y, x) i . Set
= minin i > 0; then
U (x; 00 , . . . , 0,m0 , . . . , n0 , . . . , n,mn ; ) G.
As x is arbitrary, G T. As H is arbitrary, is continuous.

2A3I Remarks (a) If P is upwards-directed, the condition simplifies to: for every x X, and
> 0, there are P and > 0 such that ((y), (x)) whenever y X and (y, x) .

(b) Suppose we have a set X and two non-empty families P, of pseudometrics on X, generating
topologies T and S on X. Then S T iff the identity map from X to itself is a continuous function when
regarded as a map from (X, T) to (X, S), because this will mean that G = 1 [G] belongs to T whenever
G S. Applying the proposition above to , we see that this happens iff for every , x X and > 0
there are 0 , . . . , n P and > 0 such that (y, x) whenever y X and maxin i (y, x) . Similarly,
reversing the roles of P and , we get a criterion for when T S, and putting the two together we obtain
a criterion to determine when T = S.

2A3J Subspaces: Proposition If X is a set, P a non-empty family of pseudometrics on X defining a


topology T on X, and D X, then
(a) for every P, the restriction (D) of to D D is a pseudometric on D;
(b) the topology defined by PD = {(D) : P} on D is precisely the subspace topology TD described
in 2A3C.
proof (a) is just a matter of reading through the definition in 2A3Fa. For (b), we have to think for a
moment.
(i) Suppose that G belongs to the topology defined by PD . Set
H = {H : H T, H D G},
S
H = H T, G = H D TD ;
then G G. On the other hand, if x G, then there are 0 , . . . , n P and > 0 such that
(D) (D) (D)
U (x; 0 , . . . , n ; ) = {y : y D, maxin i (y, x) < } G.
Consider
H = U (x; 0 , . . . , n ; ) = {y : y X, maxin i (y, x) < } X.
2A3Mc General topology 527

Evidently
(D) (D)
H D = U (x; 0 , . . . , n ; ) G.
Also H T. So H H and
x H D H D = G .
Thus G = G TD .
(ii) Now suppose that G TD . Consider the identity map : D X, defined by saying that (x) = x
for every x D. obviously satisfies the criterion of 2A3H, if we endow D with PD and X with P,
because ((x), (y)) = (D) (x, y) whenever x, y D and P; so must be continuous for the associated
topologies, and 1 [H] must belong to the topology defined by PD . But 1 [H] = G. Thus every set in TD
belongs to the topology defined by PD , and the two topologies are the same, as claimed.

2A3K Closures and interiors Let X be a set, P a non-empty family of pseudometrics on X and T
the topology defined by P.
(a) For any A X, x X,

x int A there is an open set included in A containing x


there are 0 , . . . , n P, > 0 such that U (x; 0 , . . . , n ; ) A.

(b) For any A X, x X, x A iff U (x; 0 , . . . , n ; ) A 6= for every 0 , . . . , n P, > 0.


(Compare 2A2B(ii), 2A3Dc.)

2A3L Hausdorff topologies Recall that a topology T is Hausdorff if any two points can be separated
by open sets (2A3E). Now a topology defined on a set X by a non-empty family P of pseudometrics is
Hausdorff iff for any two different points x, y of X there is a P such that (x, y) > 0. P P (i) Suppose that
the topology is Hausdorff and that x, y are distinct points in X. Then there is an open set G containing x
but not containing y. Now there are 0 , . . . , n P and > 0 such that U (x; 0 ), . . . , n ; ) G, in which
case i (y, x) > 0 for some i n. (ii) If P satisfies the condition, and x, y are distinct points of X, take
P such that (x, y) > 0, and set = 21 (x, y). Then U (x; ; ) and U (y; ; ) are disjoint (because if
z X, then
(z, x) + (z, y) (x, y) = 2,
so at least one of (z, x), (z, y) is greater than or equal to ), and they are open sets containing x, y
respectively. As x and y are arbitrary, the topology is Hausdorff. QQ
In particular, metrizable topologies are Hausdorff.

2A3M Convergence of sequences (a) If (X, T) is any topological space, and hxn inN is a sequence
in X, we say that hxn inN converges to x X, or that x is a limit of hxn inN , or hxn inN x, if for
every open set G containing x there is an n0 N such that xn G for every n n0 .
(b) Warning In general topological spaces, it is possible for a sequence to have more than one limit, and
we cannot safely write x = limn xn . But in Hausdorff spaces, this does not occur. P P If T is Hausdorff,
and x, y are distinct points of X, there are disjoint open sets G, H such that x G and y H. If now
hxn inN converges to x, there is an n0 such that xn G for every n n0 , so xn
/ H for every n n0 , and
hxn inN cannot converge to y. QQ In particular, a sequence in a metric space can have at most one limit.
(c) Let X be a set, and P a non-empty family of pseudometrics on X, generating a topology T; let hxn inN
be a sequence in X and x X. Then hxn inN converges to x iff limn (xn , x) = 0 for every P. P P
(i) Suppose that hxn inN x and that P. Then for any > 0 the set G = U (x; ; ) is an open set
containing x, so there is an n0 such that xn G for every n n0 , that is, (xn , x) < for every n n0 .
As is arbitrary, limn (xn , x) = 0. (ii) If the condition is satisfied, take any open set G containing X.
Then there are 0 , . . . , k P and > 0 such that U (x; 0 , . . . , k ; ) G. For each i k there is an ni N
such that i (xn , x) < for every n ni . Set n = max(n0 , . . . , nk ); then xn U (x; 0 , . . . , k ; ) G for
every n n . As G is arbitrary, hxn inN x. Q Q
528 Appendix 2A3Md

(d) Let (X, ) be a metric space, A a subset of X and x X. Then x A iff there is a sequence in A
converging to x. PP(i) If x A, then for every n N there is a point xn A U (x; ; 2n ) (2A3Kb); now
hxn inN x. (ii) If hxn inN is a sequence in A converging to x, then for every open set G containing x
there is an n such that xn G, so that A G 6= ; by 2A3Dc, x A. Q Q

2A3N Compactness The next concept we need is the idea of compactness in general topological
spaces.

(a) If (X, T) is any topological space, a subset K of X is compact if whenever G is a family in T covering
K, then there is a finite G0 G covering K. (Cf. 2A2D. A warning: many authors reserve the term
compact for Hausdorff spaces.) A set A X is relatively compact in X if there is a compact subset of
X including A.

(b) Just as in 2A2E-2A2G (and the proofs are the same in the general case), we have the following results.
(i) If K is compact and E is closed, then K E is compact.
(ii) If K X is compact and : K Y is continuous, where (Y, S) is another topological space, then
[K] is a compact subset of Y .
(iii) If K X is compact and : K R is continuous, then is bounded and attains its bounds.

2A3O Cluster points (a) If (X, T) is a topological space, and hxn inN is a sequence in X, then a
cluster point of hxn inN is an x X such that whenever G is an open set containing x and n N then
there is a k n such that xk G.

(b) Now if (X, T) is a topological space and A X is relatively compact, every sequence hxn inN in A
has a cluster point in X. P
P Let K be a compact subset of X including A. Set
G = {G : G T, {n : xn G} is finite}.
?? If G covers K, then there is a finite G0 G covering K. Now
S S
N = {n : xn A} = {n : xn G0 } = GG0 {n : xn G0 }
S
is a finite union of finite sets, which is absurd. XX Thus G does not cover K. Take any x K \ G. If G T
and x G and n N, then G / G so {k : xk G} is infinite and there is a k n such that xk G. Thus x
is a cluster point of hxn inN , as required. QQ

2A3P Filters In R r , and more generally in all metrizable spaces, topological ideas can be effectively
discussed in terms of convergent sequences. (To be sure, this occasionally necessitates the use of a weak
form of the axiom of choice, in order to choose a sequence; but as measure theory without such choices
is eviscerated, there is no point in fussing about them here.) For topological spaces in general, however,
sequences are quite inadequate, for very interesting reasons which I shall not enlarge upon. Instead we need
to use nets or filters. The latter take a moments more effort at the beginning, but are then (in my view)
much easier to work with, so I describe this method now.

2A3Q Convergent filters (a) Let (X, T) be a topological space, F a filter on X (see 2A1I), x X.
We say that F is convergent to x, or that x is a limit of F, and write F x, if every open set containing
x belongs to F.

(b) Let (X, T) and (Y, S) be topological spaces, : X Y a continuous function, x X and F a filter
on X converging to x. Then [[F]] (as defined in 2A1Ib) converges to (x) (because 1 [G] is an open set
containing x whenever G is an open set containing (x)).

2A3R Now we have the following characterization of compactness.


Theorem Let X be a topological space, and K a subset of X. Then K is compact iff every ultrafilter on
X containing K has a limit in K.
2A3Sd General topology 529

proof (a) Suppose that K is compact and that F is an ultrafilter on X containing K. Set
G = {G : G X is open, X \ G F}.
Then the union of any two members of G belongs to G, so the union of any finite number of members of G
belongs to G; also no member of G can include K, S because X \ K
/ F. Because K is compact, it follows
that G cannot cover K. Let x be any point of K \ G. If G is any open set containing x, then G / G so
X \G / F; but this means that G must belong to F, because F is an ultrafilter. As G is arbitrary, F x.
Thus every ultrafilter on X containing K has a limit in K.
(b) Now suppose that every ultrafilter on X containing K has a limit in K. Let G be a cover of K by
open sets in X. ?? Suppose, if possible, that G has no finite subcover. Set
S
F = {F : there is a finite G0 G, F G0 K}.
S
Then F is a filter on X. P
P (i) X K so X F.
S S
G0 = G0 6 K
for any
S finite G0 S G, by hypothesis, so
/ F. (ii) IfSE, F F there are finite sets G1 , G2 G such that
E G1 and F G2 both include K; now (E F ) (G1 GS 2 ) K so E F F. (iii) If X E F F
then there is a finite G0 G such that F G0 K; now E G0 K and E F. Q Q
By the Ultrafilter Theorem (2A1O), there is an ultrafilter F on X including F. Of course K itself
belongs to F, so K F . By hypothesis, F has a limit x K. But now there is a set G G containing x,
and (X \ G) G K, so X \ G F F ; which means that G cannot belong to F , and x cannot be a
limit of F . X
X
So G has a finite subcover. As G is arbitrary, K must be compact.
Remark Note that this theorem depends vitally on the Ultrafilter Theorem and therefore on the axiom of
choice.

2A3S Further calculations with filters (a) In general, it is possible for a filter to have more than
one limit; but in Hausdorff spaces this does not occur. PP (Compare 2A3Mb.) If (X, T) is Hausdorff, and x,
y are distinct points of X, there are disjoint open sets G, H such that x G and y H. If now a filter F
on X converges to x, G F so H / F and F does not converge to y. QQ
Accordingly we can safely write x = lim F when F x in a Hausdorff space.

(b) Now suppose that X is a set, F is a filter on X, (Y, S) is a Hausdorff space, D F and : D Y
is a function. Then we write limxF (x) for lim [[F]] if this is defined in Y ; that is, limxF (x) = y iff
1 [H] F for every open set H containing y.
In the special case Y = R, limxF (x) = a iff {x : |(x) a| } F for every > 0 (because every
open set containing a includes a set of the form [a, a+], which in turn includes the open set ]a , a + [).

(c) Suppose that X and Y are sets, F is a filter on X, is a non-empty family of pseudometrics on Y
defining a topology S on Y , and : X Y is a function. Then the image filter [[F]] converges to y Y
iff limxF ((x), y) = 0 in R for every . P P (i) Suppose that [[F]] y. For every and > 0,
U (y; ; ) = {z : (z, y) < } is an open set containing y (2A3G), so belongs to [[F]], and its inverse image
{x : 0 ((x), y) < } belongs to F. As is arbitrary, limxF ((x), y) = 0. As is arbitrary, satisfies
the condition. (ii) Now suppose that limxF ((x), y) = 0 for every . Let G be any open set in Y
containing y. Then there are 0 , . . . , n and > 0 such that
T
U (y; 0 , . . . , n ; ) = in U (y; i ; ) G.
For each i n,
1 [U (y; i ; )] = {x : ((x), y) < }
belongs to F; because F is closed under finite intersections, so do 1 [U (y; 0 , . . . , n ; )] and its superset
1 [G]. Thus G [[F]]. As G is arbitrary, [[F]] y. Q
Q

(d) In particular, taking X = Y and the identity map, if X has a topology T defined by a non-empty
family P of pseudometrics, then a filter F on X converges to x X iff limyF (y, x) = 0 for every P.
530 Appendix 2A3Se

(e)(i) If X is any set, F is an ultrafilter on X, (Y, S) is a topological space, and h : X Y is a function


such that h[F ] is relatively compact in Y for some F F, then limxF h(x) is defined in Y . P P Let K Y
be a compact set including h[F ]. Then K h[[F]], which is an ultrafilter (2A1N), so h[[F]] has a limit in
Y (2A3R), which is limxF h(x). Q Q
(ii) If X is any set, F is an ultrafilter on X, and h : X R is a function such that h[F ] is bounded
in R for some set F F, then limxF h(x) exists in R. P P h[F ] is closed and bounded, therefore compact
(2A2F), so h[F ] is relatively compact and we can use (i). QQ

(f ) The concepts of lim sup, lim inf can be applied to filters. Suppose that F is a filter on a set X, and
that f : X R is any function. Then

lim sup f (x) = inf{u : u [, ], {x : f (x) u} F}


xF
= inf sup f (x) [, ],
F F xF

lim inf f (x) = sup{u : u [, ], {x : f (x) u} F}


xF
= sup inf f (x).
F F xF

It is easy to see that, for any two functions f , g : X R,


limxF f (x) = a iff a = lim supxF f (x) = lim inf xF f (x),
and
lim supxF f (x) + g(x) lim supxF f (x) + lim supxF g(x),

lim inf xF f (x) + g(x) lim inf xF f (x) + lim inf xF g(x),

lim inf xF (f (x)) = lim supxF f (x), lim supxF (f (x)) = lim inf xF f (x),

lim inf xF cf (x) = c lim inf xF f (x), lim supxF cf (x) = c lim supxF f (x),
whenever the right-hand-sides are defined in [, ] and c 0. So if a = limxF f (x) and b = limxF (x)
exist in R, limxF f (x)+g(x) exists and is equal to a+b and limxF cf (x) exists and is equal to c limxF f (x)
for every c R.
We also see that if f : X R is such that
for every > 0 there is an F F such that supxF f (x) + inf xF f (x),
then lim supxF f (x) + lim inf xF f (x) for every > 0, so that limxF f (x) is defined in [, ].

(g) Note that the standard limits of real analysis can be represented in the form described here. For
instance, limn , lim supn , lim inf n correspond to limnF0 , lim supnF0 , lim inf nF0 where F0 is
the Frechet filter on N, the filter {N \ A : A N is finite} of cofinite subsets of N. Similarly, lima ,
lim supa , lim inf a correspond to limF , lim supF , lim inf F where
F = {A : A R, h > 0 such that ]a, a + h] A}.

2A3T Product topologies We need some brief remarks concerning topologies on product spaces.

(a) Let (X, T) and (Y, S) be topological spaces. Let U be the set of subsets U of X Y such that for
every (x, y) U there are G T, H S such that (x, y) G H U . Then U is a topology on X Y .
P
P (i) U because the condition for membership of U is vacuously satisfied. X Y U because X T,
Y S and (x, y) X Y X Y for every (x, y) X Y . (ii) If U , V U and (x, y) U V , then
there are G, G0 T, H, H 0 S such that
(x, y) G H U , (x, y) G0 H 0 V ;
2A3Uc General topology 531

now G G0 T, H H 0 S and
(x, y) (G G0 ) (H H 0 ) U V .
S
As (x, y) is arbitrary, U V U. (iii) If U U and (x, y) US, then there is a U U suchSthat (x, y) U ;
now there are G T, H S such that (x, y) G H U U . As (x, y) is arbitrary, U U. Q Q
U is called the product topology on X Y .
(b) Suppose, in (a), that T and S are defined by non-empty families P, of pseudometrics in the manner
of 2A3F. Then U is defined by the family = { : P} { : } of pseudometrics on X Y , where
((x, y), (x0 , y 0 )) = (x, x0 ), ((x, y), (x0 , y 0 )) = (y, y 0 )
whenever x, x0 X, y, y 0 Y , P and .
PP (i) Of course you should check that every , is a pseudometric on X Y .
(ii) If U U and (x, y) U , then there are G T, H S such that (x, y) G H U . There are
0 , . . . , m P, 0 , . . . , n , , 0 > 0 such that (in the language of 2A3Fc) U (x; 0 , . . . , m ; ) G,
U (x; 0 , . . . , n ; ) H. Now
U ((x, y); 0 , . . . , m , 0 , . . . , n ; min(, 0 )) U .
As (x, y) is arbitrary, U is open for the topology generated by .
(iii) If U X Y is open for the topology defined by , take any (x, y) U . Then there are
0 , . . . , k and > 0 such that U ((x, y); 0 , . . . , k ; ) U . Take 0 , . . . , m P and 0 , . . . , n
such that {0 , . . . , k } {0 , . . . , m , 0 , . . . , n }; then G = U (x; 0 , . . . , m ; ) T (2A3G), H =
U (y; 0 , . . . , n ; ) S, and
G H = U ((x, y); 0 , . . . , m , 0 , . . . , n ; ) U ((x, y); 0 , . . . , k ; ) U .
As (x, y) is arbitrary, U U. This completes the proof that U is the topology defined by . Q
Q
(c) In particular, the product topology on R r R s is the Euclidean topology if we identify R r R s with
r+s
R . P P The product topology is defined by the two pseudometrics 1 , 2 , where for x, x0 R r and y,
0 s
y R I write
1 ((x, y), (x0 , y 0 )) = kx x0 k,
2 ((x, y), (x0 , y 0 )) = ky y 0 k
(2A3F(b-ii)). Similarly, the Euclidean topology on R r R s = R r+s is defined by the metric , where
p
((x, y), (x0 , y 0 )) = k(x y) (x0 , y 0 )k = kx x0 k2 + ky y 0 k2 .
Now if (x, y) R r R s and > 0, then
U ((x, y); ; ) U ((x, y); j ; )
for both j, while

U ((x, y); 1 , 2 ; ) U ((x, y); ; ).
2
Thus, as remarked in 2A3Ib, each topology is included in the other, and they are the same. Q
Q

2A3U Dense sets (a) If X is a topological space, a set D X is dense in X if D = X, that is, if
every non-empty open set meets D. More generally, if D A X, then D is dense in A if it is dense for
the subspace topology of A (2A3C), that is, if A D.
(b) If T is defined by a non-empty family P of pseudometrics on X, then D X is dense iff
U (x; 0 , . . . , n ; ) D 6=
whenever x X, 0 , . . . , n P and > 0.
(c) If (X, T), (Y, S) are topological spaces, of which Y is Hausdorff, and f , g : X Y are continuous
functions which agree on some dense subset D of X, then f = g. P P?? Suppose, if possible, that there is
an x X such that f (x) 6= g(x). Then there are open sets G, H Y such that f (x) G, g(x) H and
G H = . Now f 1 [G] g 1 [H] is an open set, containing x and therefore not empty, but it cannot meet
D, so x / D and D is not dense. XXQ Q In particular, this is the case if (X, ) and (Y, ) are metric spaces.
532 Appendix 2A3Ud

(d) A topological space is called separable if it has a countable dense subset. For instance, R r is separable
for every r 1, since Qr is dense.

2A4 Normed spaces


In Chapter 24 I discuss the spaces Lp , for 1 p , and describe their most basic properties. These
spaces form a group of fundamental examples for the general theory of normed spaces, the basis of functional
analysis. This is not the book from which you should learn that theory, but once again it may save you
trouble if I briefly outline those parts of the general theory which are essential if you are to make sense of
the ideas here.

2A4A The real and complex fields While the most important parts of the theory, from the point
of view of measure theory, are most effectively dealt with in terms of real linear spaces, there are many
applications in which complex linear spaces are essential. I will therefore use the phrase
U is a linear space over R
C
to mean that U is either a linear space over the field R or a linear space over the field C; it being understood
that in any particular context all linear spaces considered will be over the same field. In the same way, I
will write R
C to mean that belongs to whichever is the current underlying field.

2A4B Definitions (a) A normed space is a linear space U over R C together with a norm, that is, a
functional k k : U [0, [ such that
ku + vk kuk + kvk for all u, v U ,
kuk = ||kuk for u U , R C,
kuk = 0 only when u = 0, the zero vector of U .
(Observe that if u = 0 (the zero vector) then 0u = u (where this 0 is the zero scalar) so that kuk = |0|kuk =
0.)

(b) If U is a normed space, then we have a metric on U defined by saying that (u, v) = ku vk for u,
v U. PP (u, v) [0, [ for all u, v because kuk [0, [ for every u. (u, v) = (v, u) for all u, v because
kv uk = | 1|ku vk = ku vk for all u, v. If u, v, w U then
(u, w) = ku wk = k(u v) + (v w)k ku vk + kv wk = (u, v) + (v, w).
If (u, v) = 0 then ku vk = 0 so u v = 0 and u = v. Q
Q
We therefore have a corresponding topology, with open and closed sets, closures, convergent sequences
and so on.

(c) If U is a normed space, a set A U is bounded (for the norm) if {kuk : u A} is bounded in R;
that is, there is some M 0 such that kuk M for every u A.

2A4C Linear subspaces (a) If U is any normed space and V is a linear subspace of U , then V is also
a normed space, if we take the norm of V to be just the restriction to V of the norm of U ; the verification
is trivial.

P Take u, u0 V and R
(b) If V is a linear subspace of U , so is its closure V . P C . If > 0, set
= /(2 + ||) > 0; then there are v, v V such that ku vk , ku0 v 0 k . Now v + v 0 , v V and
0

k(u + u0 ) (v + v 0 )k ku vk + ku0 v 0 k , ku vk ||ku vk .


0 0
As is arbitrary, u + u and u belong to V ; as u, u and are arbitrary, and 0 surely belongs to V V ,
V is a linear subspace of U . Q
Q

2A4D Banach spaces (a) If U is a normed space, a sequence hun inN in U is Cauchy if kum un k 0
as m, n , that is, for every > 0 there is an n0 N such that kum un k for all m, n n0 .
2A4G Normed spaces 533

(b) A normed space U is complete if every Cauchy sequence has a limit; a complete normed space is
called a Banach space.

2A4E It is helpful to know the following result.


Lemma Let U be a normed space such that hun inN is convergent (that is, has a limit) in U whenever
hun inN is a sequence in U such that kun+1 un k 4n for every n N. Then U is complete.
proof Let hun inN be any Cauchy sequence in U . For each k N, let nk N be such that kum un k 4k
whenever m, n nk . Set vk = unk for each k. Then kvk+1 vk k 4k (whether nk nk+1 or nk+1 nk ).
So hvk ikN has a limit v U . I seek to show that v is the required limit of hun inN . Given > 0, let l N
be such that kvk vk for every k l; let k l be such that 4k ; then if n nk ,
kun vk = k(un vk ) + (vk v)k kun vk k + kvk vk kun unk k + 2.
As is arbitrary, v is a limit of hun inN . As hun inN is arbitrary, U is complete.

2A4F Bounded linear operators (a) Let U , V be two normed spaces. A linear operator T : U V
is bounded if {kT uk : u U, kuk 1} is bounded. (Warning! in this context, we do not ask for the
whole set of values T [U ] to be bounded; a bounded linear operator need not be what we ordinarily call
a bounded function.) Write B(U ; V ) for the space of all bounded linear operators from U to V , and for
T B(U ; V ) write kT k = sup{kT uk : u U, kuk 1}.

(b) A useful fact: kT uk kT kkuk for every T B(U ; V ), u U . P


P If || > kuk then
1 1
k uk = kuk 1,
||
so
1 1
kT uk = kT ( u)k = ||kT ( u)k ||kT k;

as is arbitrary, kT uk kT kkuk. Q
Q

(c) A linear operator T : U V is bounded iff it is continuous for the norm topologies on U and V . P
P
(i) If T is bounded, u0 U and > 0, then
kT u T u0 k = kT (u u0 )k kT kku u0 k

whenever ku u0 k ; by 2A3H, T is continuous. (ii) If T is continuous, then there is some > 0
1+kT k
such that kT uk = kT u T 0k 1 whenever kuk = ku 0k . If now kuk 1,
1 1
kT uk = kT (u)k ,

so T is a bounded operator. Q
Q

2A4G Theorem B(U ; V ) is a linear space over R


C , and k k is a norm on B(U ; V ).
proof I am rather supposing that you are aware, but in any case you will find it easy to check, that if
S : U V and T : U V are linear operators, and R
C , then we have linear operators S + T and T
from U to V defined by the formulae
(S + T )(u) = Su + T u, (T )(u) = (T u)
for every u U ; moreover, that under these definitions of addition and scalar multiplication the space of all
linear operators from U to V is a linear space. Now we see that whenever S, T B(U ; V ), R C, u U
and kuk 1,
k(S + T )(u)k = kSu + T uk kSuk + kT uk kSk + kT k,

k(T )uk = k(T u)k = ||kT uk ||kT k;


534 Appendix 2A4G

so that S + T and T belong to B(U ; V ), with kS + T k kSk + kT k and kT k ||kT k. This shows that
B(U ; V ) is a linear subspace of the space of all linear operators and is therefore a linear space over R
C in its
own right. To check that the given formula for kT k defines a norm, most of the work has just been done;
I suppose I should remark, for the sake of form, that kT k [0, [ for every T ; if = 0, then of course
kT k = 0 = ||kT k; for other ,
||kT k = ||k1 T k |||1 |kT k = kT k,
so kT k = ||kT k. Finally, if kT k = 0 then kT uk kT kkuk = 0 for every u U , so T u = 0 for every u and
T is the zero operator (in the space of all linear operators, and therefore in its subspace B(U ; V )).

2A4H Dual spaces The most important case of B(U ; V ) is when V is the scalar field R C itself (of course
we can think of RC as a normed space over itself, writing kk = || for each scalar ). In this case we call
R 0
B(U ; C ) the dual of U ; it is commonly denoted U or U ; I use the latter.

2A4I Extensions of bounded operators: Theorem Let U be a normed space and V U a dense
linear subspace. Let W be a Banach space and T0 : V W a bounded linear operator; then there is a
unique bounded linear operator T : U W extending T0 , and kT k = kT0 k.
proof (a) For any u U , there is a sequence hvn inN in V converging to u. Now
kT0 vm T0 vn k = kT0 (vm vn )k kT0 kkvm vn k kT0 k(kvm uk + ku vn k) 0
as m, n , so hT0 vn inN is Cauchy and w = limn T0 vn is defined in W . If hvn0 inN is another sequence
in V converging to u, then

kw T0 vn0 k kw T0 vn k + kT0 (vn vn0 )k


kw T0 vn k + kT0 k(kvn uk + ku vn0 k) 0
as n , so w is also the limit of hT0 vn0 inN .
(b) We may therefore define T : U W by setting T u = limn T0 vn whenever hvn inN is a sequence
in V converging to u. If v V , then we can set vn = v for every n to see that T v = T0 v; thus T extends T0 .
If u, u0 U and R 0 0
C , take sequences hvn inN , hvn inN in V converging to u, u respectively; in this case
k(u + u0 ) (vn + vn0 )k ku vn k + ku0 vn0 k 0, ku un k = ||ku un k 0
0
as n , so that T (u + u ) = limn T0 (vn + vn0 ), T (u) = limn T0 (vn ), and

kT (u + u0 ) T u T u0 k kT (u + u0 ) T0 (vn + vn0 )k + kT0 vn T uk + kT0 vn0 T u0 k


0,

kT (u) T uk kT (u) T0 (vn )k + ||kT0 vn T uk 0


as n . This means that kT (u + u0 ) T u T u0 k = 0, kT (u) T uk = 0 so T (u + u0 ) = T u + T u0 ,
T (u) = T u; as u, u0 and are arbitrary, T is linear.
(c) For any u U , let hvn inN be a sequence in V converging to u. Then

kT uk kT0 vn k + kT u T0 vn k kT0 kkvn k + kT u T0 vn k


kT0 k(kuk + kvn uk) + kT u T0 vn k kT0 kkuk
as n , so kT uk kT0 kkuk. As u is arbitrary, T is bounded and kT k kT0 k. Of course kT k kT0 k
just because T extends T0 .
(d) Finally, let T be any other bounded linear operator from U to W extending T . If u U , there is a
sequence hvn inN in V converging to u; now
kT u T uk kT (u vn )k + kT (vn u)k (kT k + kT k)ku vn k 0
as n , so kT u T uk = 0 and T u = T u. As u is arbitrary, T = T . Thus T is unique.
2A5B Linear topological spaces 535

2A4J Normed algebras (a) A normed algebra is a normed space (U, k k) together with a multipli-
cation, a binary operator on U , such that
u (v w) = (u v) w,

u (v + w) = (u v) + (u w), (u + v) w = (u w) + (v w),

(u) v = u (v) = (u v),

ku vk kukkvk
R
for all u, v, w U and C.

(b) A Banach algebra is a normed algebra which is a Banach space. A normed algebra U is commu-
tative if its multiplication is commutative, that is, u v = v u for all u, v U .

2A5 Linear topological spaces


The principal objective of 2A3 is in fact the study of certain topologies on the linear spaces of Chapter
24. I give some fragments of the general theory.

2A5A Linear space topologies Something which is not covered in detail by every introduction to
functional analysis is the general concept of linear topological space. The ideas needed for the work of 245
are reasonably briefly expressed.
Definition A linear topological space or topological vector space over R R
C is a linear space U over C
together with a topology T such that the maps
(u, v) 7 u + v : U U U ,

(, u) 7 u : R
C U U
are both continuous, where the product spaces U U and R C U are given their product topologies (2A3T).
Given a linear space U , a topology on U satisfying the conditions above is a linear space topology. Note
that
(u, v) 7 u v = u + (1)v : U U U
will also be continuous.

2A5B All the linear topological spaces we need turn out to be readily presentable in the following
terms.
Proposition Suppose that U is a linear space over R C , and T is a family of functionals : U [0, [ such
that
(i) (u + v) (u) + (v) for all u, v U , T;
(ii) (u) (u) if u U , || 1, T;
(iii) lim0 (u) = 0 for every u U , T.
For T, define : U U [0, [ by setting (u, v) = (u v) for all u, v U . Then each is a
pseudometric on U , and the topology defined by P = { : T} renders U a linear topological space.
proof (a) It is worth noting immediately that
(0) = lim0 (0) = 0
for every T.
(b) To see that every is a pseudometric, argue as follows.
(i) takes values in [0, [ because does.
536 Appendix 2A5B

(ii) If u, v, w U then
(u, w) = (u w) = ((u v) + (v w)) (u v) + (v w) = (u, v) + (v, w).

(iii) If u, v U , then
(v, u) = (v u) = (1(u v)) (u, v) = (u, v),
and similarly (u, v) (v, u), so the two are equal.
(iv) If u U then (u, u) = (0) = 0.
(c) Let T be the topology on U defined by { : T} (2A3F).
(i) Addition is continuous because, given T, we have
(u0 + v 0 , u + v) = ((u0 + v 0 ) (u + v)) (u0 u) + (v 0 v) = (u0 , u) + (v 0 , v)
for all u, v, u0 , v 0 U . This means that, given > 0 and (u, v) U U , we shall have

(u0 + v 0 , u + v) whenever (u0 , v 0 ) U ((u, v); , ; ),
2
using the language of 2A3Tb. Because , are two of the pseudometrics defining the product topology
of U U (2A3Tb), (u, v) 7 u + v is continuous, by the criterion of 2A3H.
(ii) Scalar multiplication is continuous because if u U , n N then (nu) n (u) for every T
(induce on n). Consequently, if T,

(u) n ( u) n (u)
n

whenever || < n N, T. Now, given (, u) R C U and > 0, take n > || and > 0 such that

min(n ||, 2n ) and (u) 2 whenever || ; then
(0 u0 , u) = (0 u0 u) ((0 )u) + (0 (u0 u)) ((0 )u) + n (u0 u)
whenever u0 U and 0 R 0 0 0 0 R
C and | | < n N. Accordingly, setting ( , ) = | | for , C ,

(0 u0 , u) + n
2
whenever
(0 , u0 ) U ((, u); , ; ).
Because and are among the pseudometrics defining the topology of R
C U , the map (, u) 7 u satisfies
the criterion of 2A3H and is continuous.
Thus T is a linear space topology on U .

*2A5C We do not need it for Chapter 24, but the following is worth knowing.
Theorem Let U be a linear space and T a linear space topology on U .
(a) There is a family T of functionals satisfying the conditions (i)-(iii) of 2A5B and defining T.
(b) If T is metrizable, we can take T to consist of a single functional.
proof (a) Kelley & Namioka 76, p. 50.
(b) Kothe 69, 15.11.

2A5D Definition Let U be a linear space over R C . Then a seminorm on U is a functional : U [0, [
such that
(i) (u + v) (u) + (v) for all u, v U ;
(ii) (u) = || (u) if u U , RC.
Observe that a norm is always a seminorm, and that a seminorm is always a functional of the type described
in 2A5B. In particular, the association of a metric with a norm (2A4Bb) is a special case of 2A5B.
2A5H Linear topological spaces 537

2A5E Convex sets (a) Let U be a linear space over R C . A subset C of U is convex if u+(1)v C
whenever u, v C and [0, 1]. The intersection of any family of convex sets is convex, so forP every set
n
A U there is a smallest convex set including PnA; this is just the set of vectors expressible as i=0 i ui
where u0 , . . . , un A, 0 , . . . , n [0, 1] and i=0 i = 1 (Bourbaki 87, II.2.3); it is the convex hull of
A.

(b) If U is a linear topological space, the closure of any convex set is convex (Bourbaki 87, II.2.6). It
follows that, for any A U , the closure of the convex hull of A is the smallest closed convex set including
A; this is the closed convex hull of A.

(c) I note for future reference that in a linear topological space, the closure of any linear subspace is a
linear subspace. (Bourbaki 87, I.1.3; Kothe 69, 15.2. Compare 2A4Cb.)

2A5F Completeness in linear topological spaces In normed spaces, completeness can be described
in terms of Cauchy sequences (2A4D). In general linear topological spaces this is inadequate. The true
theory of completeness demands the concept of uniform space (see 3A4 in the next volume, or Kelley
55, chap. 6; Engelking 89, 8.1: Bourbaki 66, chap. II); I shall not describe this here, but will give a
version adapted to linear spaces. I mention this only because you will I hope some day come to the general
theory (in Volume 3 of this treatise, if not before), and you should be aware that the special case described
here gives a misleading emphasis at some points.
Definitions Let U be a linear space over RC , and T a linear space topology on U . A filter F on U is Cauchy
if for every open set G in U containing 0 there is an F F such that F F = {u v : u, v F } G. U
is complete if every Cauchy filter on U is convergent.

2A5G Cauchy filters have a simple description when a linear space topology is defined by the method
of 2A5B.
Lemma Let U be a linear space over R C , and let T be a family of functionals defining a linear space topology
on U , as in 2A5B. Then a filter F on U is Cauchy iff for every T, > 0 there is an F F such that
(u v) for all u, v F .
proof (a) Suppose that F is Cauchy, T and > 0. Then G = U (0; ; ) is open (using the language of
2A3F-2A3G), so there is an F F such that F F G; but this just means that (u v) < for all u,
v F.
(b) Suppose that F satisfies the criterion, and that G is an open set containing 0. Then there are
0 , . . . , n T and > 0 such that U (0;
T 0 , . . . , n ; ) G. For each i n there is an Fi F such that
i (u, v) < 2 for all u, v Fi ; now F = in Fi F and u v G for all u, v F .

2A5H Normed spaces and sequential completeness I had better point out that for normed spaces
the definition of 2A5F agrees with that of 2A4D.
Proposition Let (U, k k) be a normed space over R
C , and let T be the linear space topology on U defined
by the method of 2A5B from the set T = {k k}. Then U is complete in the sense of 2A5F iff it is complete
in the sense of 2A4D.
proof (a) Suppose first that U is complete in the sense of 2A5F. Let hun inN be a sequence in U which is
Cauchy in the sense of 2A4Da. Set
F = {F : F U, {n : un
/ F } is finite}.
Then it is easy to check that F is a filter on U , the image of the Frechet filter under the map n 7 un : N U .
If > 0, take m N such that kuj uk k whenever j, k m; then F = {uj : j m} belongs to F, and
ku vk for all u, v F . So F is Cauchy in the sense of 2A5F, and has a limit u say. Now, for any > 0,
the set {v : kv uk < } = U (u; k k ; ) is an open set containing u, so belongs to F, and {n : kun uk } is
finite, that is, there is an m N such that kum uk < whenever n m. As is arbitrary, u = limn un
in the sense of 2A3M. As hun inN is arbitrary, U is complete in the sense of 2A4D.
538 Appendix 2A5H

(b) Now suppose that U is complete in the sense of 2A4D. Let F be a Cauchy filter on U . For T each
n N, choose a set Fn F such that ku vk 2n for all u, v Fn . For each n N, Fn0 = in Fi
belongs to F, so is not empty; choose un Fn0 . If m N and j, k m, then both uj and uk belong to Fm ,
so kuj uk k 2m ; thus hun inN is a Cauchy sequence in the sense of 2A4Da, and has a limit u say. Now
take any > 0 and m N such that 2m+1 . There is surely a k m such that kuk uk 2m ; now
uk Fm , so
Fm {v : kv uk k 2m } {v : kv uk 2m+1 } {v : k k (v, u) },
and {v : k k (v, u) } F. As is arbitrary, F converges to u, by 2A3Sd. As F is arbitrary, U is complete.
Thus the two definitions coincide, provided at least that we allow the countably many simultaneous
choices of the un in part (b) of the proof.

2A5I Weak topologies I come now to brief notes on weak on normed spaces; from the point of view
of this volume, these are in fact the primary examples of linear space topologies. Let U be a normed linear
space over RC.

(a) Write U for its dual B(U ; R


C ) (2A4H). Let T be the set {|h| : h U }; then T satisfies the conditions
of 2A5B, so defines a linear space topology on U ; this is called the weak topology of U .

(b) A filter F on U converges to u U for the weak topology of U iff limvF |h| (v, u) = 0 for every
h U (2A3Sd), that is, iff limvF |h(v u)| = 0 for every h U , that is, iff limvF h(v) = h(u) for every
h U .

(c) A set C U is called weakly compact if it is compact for the weak topology of U . So (subject to
the axiom of choice) a set C U is weakly compact iff for every ultrafilter F on U containing C there is a
u C such that limvF h(v) = h(u) for every h U (put 2A3R together with (b) above).

(d) A subset A of U is called relatively weakly compact if it is a subset of some weakly compact subset
of U .

(e) If h U , then h : U R C is continuous for the weak topology on U and the usual topology of
R
C; this is obvious if we apply the criterion of 2A3H. So if A U is relatively weakly compact, h[A] must
be bounded in R C. P P Let C A be a weakly compact set. Then h[C] is compact in R C , by 2A3Nb, so is
bounded, by 2A2F (noting that if the underlying field is C, then it can be identified, as metric space, with
R 2 ). Accordingly h[A] is also bounded. Q Q

(f ) If V is another normed space and T : U V is a bounded linear operator, then T is continuous for
P If h V then the composition hT belongs to U . Now, for any u, v U ,
the respective weak topologies. P
|h| (T u, T v) = |h(T u T v)| = |hT (u v)| = |hT | (u, v),
taking |h| , |hT | to be the pseudometrics on V , U respectively defined by the formula of 2A5B. By 2A3H,
T is continuous. Q Q

(g) Corresponding to the weak topology on a normed space U , we have the weak* or w*-topology on
its dual U , defined by the set T = {|u| : u U }, where I write u(f ) = f (u) for every f U , u U . As
in 2A5Ia, this is a linear space topology on U . (It is essential to distinguish between the weak* topology
and the weak topology on U . The former depends only on the action of U on U , the latter on the action
of U = (U ) . You will have no difficulty in checking that u U for every u U , but the point is
that there may be members of U not representable in this way, leading to open sets for the weak topology
which are not open for the weak* topology.)

*2A5J Angelic spaces I do not rely on the following ideas, but they may throw light on some results
in 246-247. First, a topological space X is regular if whenever G X is open and x G then there is an
open set H such that x H H G. Next, a regular Hausdorff space X is angelic if whenever A X
is such that every sequence in A has a cluster point in X, then A is compact and every point of A is the
2A6C Factorization of matrices 539

limit of a sequence in A. What this means is that compactness in X, and the topologies of compact subsets
of X, can be effectively described in terms of sequences. Now the theorem (due to Eberlein and Smulian)
is that any normed space is angelic in its weak topology. (462D in Volume 4; Kothe 69, 24; Dunford
& Schwartz 57, V.6.1.) In particular, this is true of L1 spaces, which makes it less surprising that there
should be criteria for weak compactness in L1 spaces which deal only with sequences.

2A6 Factorization of matrices


I spend a couple of pages on the linear algebra of R r required for Chapter 26. I give only one proof,
because this is material which can be found in any textbook of elementary linear algebra; but I think it may
be helpful to run through the basic ideas in the language which I use for this treatise.

2A6A Determinants We need to know the following things about determinants.


(i) Every r r real matrix T has a real determinant det T .
(ii) For any r r matrices S and T , det ST = det S det T .
(iii) If T is a diagonal matrix, its determinant is just the product of its diagonal entries.
(iv) For any r r matrix T , det T 0 = det T , where T 0 is the transpose of T .
(v) det T is a continuous function of the coefficients of T .
There are so many routes through this topic that I avoid even a definition of determinant; I invite you to
check your memory, or your favourite text, to confirm that you are indeed happy with the facts above.
Pr
2A6B Orthonormal families For x = (1 , . . . , r ), y = (1 , . . . , r ) R r , write x .y = i=1 i i ; of
course kxk, as defined in 1A2A, is x. x. Recall that x1 , . . . , xk are orthonormal if xi . xj = 0 for i 6= j, 1
for i = j. The results we need here are:
(i) If x1 , . . . , xk are orthonormal vectors in R r , where k < r, then there are vectors xk+1 , . . . , xr
in R r such that x1 , . . . , xr are orthonormal.
(ii) An r r matrix P is orthogonal if P 0 P is the identity matrix; equivalently, if the columns
of P are orthonormal.
(iii) For an orthogonal matrix P , det P must be 1 (put (ii)-(iv) of 2A6A together).
(iv) If P is orthogonal, then P x . P y = P 0 P x . y = x . y for all x, y R r .
(v) If P is orthogonal, so is P 0 = P 1 .
(vi) If P and Q are orthogonal, so is P Q.

2A6C I now give a proposition which is not always included in elementary presentations. Of course
there are many approaches to this; I offer a direct one.
Proposition Let T be any real r r matrix. Then T is expressible as P DQ where P and Q are orthogonal
matrices and D is a diagonal matrix with non-negative coefficients.
proof I induce on r.
(a) If r = 1, then T = (11 ). Set D = (|11 |), P = (1) and Q = (1) if 11 0, (1) otherwise.
(b)(i) For the inductive step to r + 1 2, consider the unit ball B = {x : x R r+1 , kxk 1}.
This is a closed bounded set in R r+1 , so is compact (2A2F). The maps x 7 T x : R r+1 R r+1 and
x 7 kxk : R r+1 R are continuous, so the function x 7 kT xk : B R is bounded and attains its bounds
(2A2G), and there is a u B such that kT uk kT xk for every x B. Observe that kT uk must be the
norm kT k of T as defined in 262H. Set = kT k = kT uk. If = 0, then T must be the zero matrix, and the
result is trivial; so let us suppose that > 0. In this case kuk must be exactly 1, since otherwise we should
have u = kuku0 where ku0 k = 1 and kT u0 k > kT uk.
(ii) If x R r+1 and x . u = 0, then T x . T u = 0. P
P?? If not, set = T x .T u 6= 0. Consider y = u + x
for small > 0. We have
540 Appendix 2A6C

kyk2 = y . y = u . u + 2u . x + 2 2 x . x = kuk2 + 2 2 kxk2 = 1 + 2 2 kxk2 ,


while
kT yk2 = T y .T y = T u . T u + 2T u . T x + 2 2 T x .T x = 2 + 2 2 + 2 2 kT xk2 .
But also kT yk2 2 kyk2 (2A4Fb), so
2 + 2 2 + 2 2 kT xk2 2 (1 + 2 2 kxk2 )
and
2 2 2 2 2 kxk2 2 2 kT xk2 ,
that is,
2 ( 2 kxk2 kT xk2 ).
But this surely cannot be true for all > 0, so we have a contradiction. X
XQQ
(iii) Set v = 1 T u, so that kvk = 1. Let u1 , . . . , ur+1 be orthonormal vectors such that ur+1 = u,
and let Q0 be the orthogonal (r + 1) (r + 1) matrix with columns u1 , . . . , ur+1 ; then, writing e1 , . . . , er+1
for the standard orthonormal basis of R r+1 , we have Q0 ei = ui for each i, and Q0 er+1 = u. Similarly, there
is an orthogonal matrix P0 such that P0 er+1 = v.
Set T1 = P01 T Q0 . Then
T1 er+1 = P01 T u = P01 v = er+1 ,
while if x . er+1 = 0 then Q0 x .u = 0 (2A6B(iv)), so that
T1 x . er+1 = P0 T1 x . P0 er+1 = T Q0 x .v = 0,
by (ii). This means that T1 must be of the form

S 0
,
0
where S is an r r matrix.
(iv) By the inductive hypothesis, S is expressible as P DQ, where P and Q are orthogonal r r matrices
and D is a diagonal r r matrix with non-negative coefficients. Set

P 0 Q 0 D 0
P1 = , Q1 = , D= .
0 1 0 1 0
Then P1 and Q1 are orthogonal and D is diagonal, with non-negative coefficients, and P1 DQ1 = T1 . Now
set
P = P0 P1 , Q = Q1 Q1
0 ,

so that P and Q are orthogonal and


P DQ = P0 P1 DQ1 Q1 1
0 = P0 T1 Q0 = T .

Thus the induction proceeds.


252Yf Concordance 541

Concordance
I list here the section and paragraph numbers which have (to my knowledge) appeared in print in references
to this volume, and which have since been changed.

215Yc Measurable envelopes This exercise, referred to in the May 2000 edition of Volume 1, has
been moved to 216Yc.

252Yf Ordinate sets This exercise, referred to in the May 2000 edition of Volume 1, has been moved
to 252Yh.
542 References

References for Volume 2


Alexits G. [78] (ed.) Fourier Analysis and Approximation Theory. North-Holland, 1978 (Colloq. Math.
Soc. Janos Bolyai 19).
Bergelson V., March P. & Rosenblatt J. [96] (eds.) Convergence in Ergodic Theory and Probability. de
Gruyter, 1996.
du Bois-Reymond P. [1876] Untersuchungen uber die Convergenz und Divergenz der Fouriersche Darstel-
lungformeln, Abh. Akad. Munchen 12 (1876) 1-103. [282 notes.]
Bourbaki N. [66] General Topology. Hermann/Addison-Wesley, 1968. [2A5F.]
Bourbaki N. [87] Topological Vector Spaces. Springer, 1987. [2A5E.]
Carleson L. [66] On convergence and growth of partial sums of Fourier series, Acta Math. 116 (1966)
135-157. [282 notes, 286 intro., 286 notes.]
Defant A. & Floret K. [93] Tensor Norms and Operator Ideals, North-Holland, 1993. [253 notes.]
Dudley R.M. [89] Real Analysis and Probability. Wadsworth & Brooks/Cole, 1989. [282 notes.]
Dunford N. & Schwartz J.T. [57] Linear Operators I. Wiley, 1957 (reprinted 1988). [244 notes, 2A5J.]
Enderton H.B. [77] Elements of Set Theory. Academic, 1977. [2A1.]
Engelking R. [89] General Topology. Heldermann, 1989 (Sigma Series in Pure Mathematics 6). [2A5F.]
Etemadi N. [96] On convergence of partial sums of independent random variables, pp. 137-144 in Bergel-
son March & Rosenblatt 96. [272U.]
Evans L.C. & Gariepy R.F. [92] Measure Theory and Fine Properties of Functions. CRC Press, 1992.
[263Ec, 265 notes.]
Federer H. [69] Geometric Measure Theory. Springer, 1969 (reprinted 1996). [262C, 263Ec, 264 notes,
265 notes.]
Feller W. [66] An Introduction to Probability Theory and its Applications, vol. II. Wiley, 1966. [Chap. 27
intro., 274H, 275Xc, 285N.]
Fremlin D.H. [74] Topological Riesz Spaces and Measure Theory. Cambridge U.P., 1974. [232 notes,
241F, 244 notes, 245 notes, 247 notes.]
Fremlin D.H. [93] Real-valued-measurable cardinals, pp. 151-304 in Judah 93. [232Hc.]
Haimo D.T. [67] (ed.) Orthogonal Expansions and their Continuous Analogues. Southern Illinois Univer-
sity Press, 1967.
Hall P. [82] Rates of Convergence in the Central Limit Theorem. Pitman, 1982. [274Hc.]
Halmos P.R. [50] Measure Theory. Van Nostrand, 1950. [251 notes, 252 notes, 255Yn.]
Halmos P.R. [60] Naive Set Theory. Van Nostrand, 1960. [2A1.]
Henle J.M. [86] An Outline of Set Theory. Springer, 1986. [2A1.]
Hunt R.A. [67] On the convergence of Fourier series, pp. 235-255 in Haimo 67. [286 notes.]
Jorsbe O.G. & Mejlbro L. [82] The Carleson-Hunt Theorem on Fourier Series. Springer, 1982 (Lecture
Notes in Mathematics 911). [286 notes.]
Judah H. [93] (ed.) Proceedings of the Bar-Ilan Conference on Set Theory and the Reals, 1991. Amer.
Math. Soc. (Israel Mathematical Conference Proceedings 6), 1993.
Kakutani S. [41] Concrete representation of abstract L-spaces and the mean ergodic theorem, Ann. of
Math. 42 (1941) 523-537. [242 notes.]
Kelley J.L. [55] General Topology. Van Nostrand, 1955. [2A5F.]
Kelley J.L. & Namioka I. [76] Linear Topological Spaces. Springer, 1976. [2A5C.]
Kirzbraun M.D. [34] Uber die zusammenziehenden und Lipschitzian Transformationen, Fund. Math. 22
(1934) 77-108. [262C.]
Kolmogorov A.N. [26] Une serie de Fourier-Lebesgue divergente partout, C. R. Acad. Sci. Paris 183
(1926) 1327-1328. [282 notes.]
Komlos J. [67] A generalization of a problem of Steinhaus, Acta Math. Acad. Sci. Hung. 18 (1967)
217-229. [276H.]
Korner T.W. [88] Fourier Analysis. Cambridge U.P., 1988. [282 notes.]
Kothe G. [69] Topological Vector Spaces I. Springer, 1969. [2A5C, 2A5J.]
References 543

Krivine J.-L. [71] Introduction to Axiomatic Set Theory. D. Reidel, 1971. [2A1.]
Lacey M. & Thiele C. [00] A proof of boundedness of the Carleson operator, Math. Research Letters 7
(2000) 1-10. [286 intro., 286H.]
Lebesgue H. [72] Oeuvres Scientifiques. LEnseignement Mathematique, Institut de Mathematiques, Univ.
de Geneve, 1972. [Chap. 27 intro.]
Liapounoff A. [1901] Nouvelle forme du theoreme sur la limite de probabilite, Mem. Acad. Imp. Sci.
St-Petersbourg 12(5) (1901) 1-24. [274Xg.]
Lighthill M.J. [59] Introduction to Fourier Analysis and Generalised Functions. Cambridge U.P., 1959.
[284 notes.]
Lindeberg J.W. [22] Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung,
Math. Zeitschrift 15 (1922) 211-225. [274Ha, 274 notes.]
Lipschutz S. [64] Set Theory and Related Topics. McGraw-Hill, 1964 (Schaums Outline Series). [2A1.]
Loeve M. [77] Probability Theory I. Springer, 1977. [Chap. 27 intro., 274H.]
Luxemburg W.A.J. & Zaanen A.C. [71] Riesz Spaces I. North-Holland, 1971. [241F.]
Mozzochi C.J. [71] On the Pointwise Convergence of Fourier Series. Springer, 1971 (Lecture Notes in
Mathematics 199). [286 notes.]
Renyi A. [70] Probability Theory. North-Holland, 1970. [274Hc.]
Roitman J. [90] An Introduction to Set Theory. Wiley, 1990. [2A1.]
Schipp F. [78] On Carlesons method, pp. 679-695 in Alexits 78. [286 notes.]
Semadeni Z. [71] Banach spaces of continuous functions I. Polish Scientific Publishers, 1971. [253 notes.]
Shiryayev A. [84] Probability. Springer, 1984. [285N.]
Sjolin P. [71] Convergence almost everywhere of certain singular integrals and multiple Fourier series,
Arkiv for Math. 9 (1971) 65-90. [286 notes.]
Zaanen A.C. [83] Riesz Spaces II. North-Holland, 1983. [241F.]
Zygmund A. [59] Trigonometric Series. Cambridge U.P, 1959. [244 notes, 282 notes, 284Xj.]
544 Index

Index to volumes 1 and 2


Principal topics and results
The general index below is intended to be comprehensive. Inevitably the entries are voluminous to the
point that they are often unhelpful. I am therefore preparing a shorter, better-annotated, index which will,
I hope, help readers to focus on particular areas. It does not mention definitions, as the bold-type entries in
the main index are supposed to lead efficiently to these; and if you draw blank here you should always, of
course, try again in the main index. Entries in the form of mathematical assertions frequently omit essential
hypotheses and should be checked against the formal statements in the body of the work.
absolutely continuous real functions 225
as indefinite integrals 225E
absolutely continuous additive functionals 232
characterization 232B
atomless measure spaces
have elements of all possible measures 215D
Borel sets in R r 111G
and Lebesgue measure 114G, 115G, 134F
bounded variation, real functions of 224
as differences of monotonic functions 224D
integrals of their derivatives 224I
Lebesgue decomposition 226C
Cantor set and function 134G, 134H
Caratheodorys construction of measures from outer measures 113C
Carlesons theorem (Fourier series of square-integrable functions converge a.e.) 286
Central Limit Theorem (sum of independent random variables approximately normal) 274
Lindebergs condition 274F, 274G
change
R of variable inR the integral 235
J g d = g d 235A, 235E, 235L
finding J 235O;
J = | det T | for linear operators T 263A; J = | det 0 | for differentiable operators 263D
when the measures are Hausdorff measures 265B, 265E
when
R is inverse-measure-preserving
R 235I
g d = J g d 235T
characteristic function of a probability distribution 285
sequences of distributions converge in vague topology iff characteristic functions converge pointwise
285L
complete measure space 212
completion of a measure 212C et seq.
concentration of measure 264H
conditional expectation
of a function 233
as operator on L1 () 242J
construction of measures
image measures 112E
from outer measures (Caratheodorys method) 113C
subspace measures 131A, 214A
product measures 251C, 251F, 251W, 254C
as pull-backs 132G
convergence theorems (B.Levi, Fatou, Lebesgue) 123
convergence in measure (linear space topology on L0 )
on L0 () 245
when Hausdorff/complete/metrizable 245E
Hausdorff Principal topics and results 545

convex functions 233G et seq.


convolution of functions
r
(on
R R ) 255 R
h (f g) = h(x + y)f (x)g(y)dxdy 255G
f (g h) = (f g) h 255J
convolution of measures
r
(on
R R ) 257 R
h d(1 2 ) = h(x + y)1 (dx)2 (dy) 257B
of absolutely continuous measures 257F
countable sets 111F, 1A1C et seq.
countable-cocountable measure 211R
counting measure 112Bd
differentiable functions (from R r to R s ) 262, 263
direct sum of measure spaces 214K
distribution of a finite family of random variables 271
as a Radon measure 271B
of (X X ) 271J
of an independent family 272G
determined by characteristic functions 285M
Doobs Martingale Convergence Theorem 275G
exhaustion, principle of 215A
extended real line 135
R R
Fatous Lemma ( lim inf lim inf for sequences of non-negative functions) 123B
Fejer sums (running averages of Fourier sums) converge to local averages of f 282H
uniformly if f is continuous 282G
Fourier series 282
norm-converge in L2 282J
converge at points of differentiability 282L
converge to midpoints of jumps, if f of bounded variation 282O
and convolutions 282Q
converge a.e. for square-integrable function 286V
Fourier transforms
on R 283, 284
Rd
formula for c f in terms of f 283F
and convolutions 283M
in terms of action on test functions 284H et seq.
of square-integrable functions 284O, 286U
inversion formulae for differentiable functions 283I; for functions of bounded variation 283L
R 2
f (y) = lim0 12 eiyx ex f (x)dx a.e. 284M
Fubinis
R theorem 252 RR
f d( ) = f (x, y)dxdy 252B
when both factors -finite 252C, 252H
for characteristic functions 252D, 252F
d
Rx Rb
Fundamental Theorem of the Calculus ( dx a
f = f (x) a.e.) 222; ( a F 0 (x)dx = F (b) F (a)) 225E

Hahn decomposition of a countably additive functional 231E


Hardy-Littlewood Maximal Theorem 286A
Hausdorff measures (on R r ) 264
are topological measures 264E
r-dimensional Hausdorff measure on R r a multiple of Lebesgue measure 264I
(r 1) dimensional measure on R r 265F-265H
546 Index image

image measures 112E


indefinite integrals
differentiate back to original function 222E, 222H
to construct measures 234
independent random variables 272
joint distributions are product measures 272G
limit theorems 273, 274
inner regularity of measures
(with respect to compact sets) Lebesgue measure 134F; Radon measures 256
integration of real-valued functions, construction 122
as a positive linear functional 122O
acting on L1 () 242B
by parts 225F
characterization of integrable functions 122P, 122R
over subsets 131D, 214E
functions and integrals with values in [, ] 133
Jensens inequality 233I-233J
expressed in L1 () 242K
Komlos subsequence theorem 276H
Lebesgues Density Theorem (in R) 223
R x+h
limh0 h1 x f = f (x) a.e. 223A
1
R x+h
limh0 2h xh
|f (x y)|dy = 0 a.e. 223D
(in R r ) 261C, 261E
Lebesgue measure, construction of 114, 115
further properties 134 R R
Lebesgues DominatedR ConvergenceR Theorem ( lim = lim for dominated sequences
RP P of functions)
R 123C
B.Levis theorem ( lim = lim for monotonic sequences of functions) 123A; ( = ) 226E
Lipschitz functions 262
differentiable a.e. 262Q
localizable measure space
assembling partial measurable functions 213N
which is not strictly localizable 216E
Lusins theorem (measurable functions are almost continuous)
(on R r ) 256F
martingales 275
L1 -bounded martingales converge a.e. 275G
when of form E(X|n ) 275H, 275I
measurable envelopes
elementary properties 132E
measurable functions
(real-valued) 121
sums, products and other operations on finitely many functions 121E
limits, infima, suprema 121F
Monotone Class Theorem 136B
monotonic functions
are differentiable a.e. 222A
non-measurable set (for Lebesgue measure) 134B
outer measures constructed from measures 132
elementary properties 132A
outer regularity of Lebesgue measure 134F
uniformly Principal topics and results 547

Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O
Poissons theorem (a count of independent rare events has an approximately Poisson distribution) 285Q
product of two measure spaces 251
basic properties of c.l.d. product measure 251I
Lebesgue measure on R r+s as a product measure 251M
more than two spaces 251W
Fubinis theorem 252B
Tonellis theorem 252G
and L1 spaces 253
continuous bilinear maps on L1 () L1 () 253F
conditional expectation on a factor 253H
product of any number of probability spaces 254
basic properties of the (completed) product 254F
characterization with inverse-measure-preserving functions 254G
products of subspaces 254L
products of products 254N
determination by countable subproducts 254O
subproducts and conditional expectations 254R

Rademachers theorem (Lipschitz functions are differentiable a.e.) 262Q


Radon measures
on R r 256
as completions of Borel measures 256C
indefinite-integral measures 256E
image measures 256G
product measures 256K
Radon-Nikodym theorem (truly continuous additive set-functions have densities) 232E
in terms of L1 () 242I
Stone-Weierstrass theorem 281
for Riesz subspaces of Cb (X) 281A
for subalgebras of Cb (X) 281E
for *-subalgebras of Cb (X; C) 281G
strictly localizable measures
sufficient condition for strict localizability
Pn 213O
1
strong law of large numbers (limn n+1 i=0 (Xi E(Xi )) = 0 a.e.) 273
P 1
when n=0 Var(Xn ) < 273D
(n+1) 2

when supnN E(|Xn |1+ ) < 273H


for identically distributed Xn 273I
for martingale difference sequences 276C, 276F
convergence of averages for k k1 , k kp 273N
subspace measures
for measurable subspaces 131
for arbitrary subspaces 214
surface measure in R r 265
tensor products of L1 spaces 253 RR
Tonellis theorem (f is integrable if |f (x, y)|dxdy < ) 252G
uniformly integrable sets in L1 246
criteria for uniform integrability 246G
and convergence in measure 246J
and weak compactness 247C
548 Index Vitali

Vitalis theorem (for coverings by intervals in R) 221A


(for coverings by balls in R r ) 261B
weak compactness in L1 () 247
Weyls Equidistribution Theorem 281N
c.l.d. version 213D et seq.
L0 () (space of equivalence classes of measurable functions) 241
as Riesz space 241E
L1 () (space of equivalence classes of integrable functions) 242
norm-completeness 242F
density of simple functions 242M
(for Lebesgue measure) density of continuous functions and step functions 242O
Lp () (space of equivalence classes of pth-power-integrable functions, where 1 < p < ) 244
is a Banach lattice 244G
has dual Lq (), where p1 + 1q = 1 244K
and conditional expectations 244M
L () (space of equivalence classes of bounded measurable functions) 243
duality with L1 243F-243G
norm-completeness 243E
order-completeness 243H

-algebras of sets 111


generated by given families 136B, 136G
-finite measures 215B

General index

References in bold type refer to definitions; references in italics are passing references. Definitions marked
with > are those in which my usage is dangerously at variance with that of some other author.
Abels theorem 224Yi
absolute summability 226Ac
absolutely continuous additive functional 232Aa, 232B, 232D, 232F-232I, 232Xa, 232Xb, 232Xd, 232Xf,
232Xh, 234Ce, 256J, 257Xf
absolutely continuous function 225B, 225C-225G, 225K-225O, 225Xa-225Xh, 225Xn, 225Xo, 225Ya,
225Yc, 232Xb, 233Xc, 244Yh, 252Yj, 256Xg, 262Bc, 263I, 264Yp, 265Ya, 282R, 283Ci
Absoluteness Theorem see Shoenfields Absoluteness Theorem
additive functional on an algebra of sets see finitely additive (136Xg, 231A), countably additive (231C)
adjoint operator 243Fc, 243 notes
algebra see algebra of sets (231A), Banach algebra (2A4Jb), normed algebra (2A4J)
algebra of sets 113Yi, 136E, 136F, 136G, 136Xg, 136Xh, 136Xk, 136Ya, 136Yb, 231A, 231B, 231Xa; see
also -algebra (111A)
almost continuous function 256F
almost every, almost everywhere 112Dd
almost surely 112De
analytic (complex) function 133Xc
angelic topological space 2A5J
Archimedean Riesz space 241F, 241Yb, 242Xc
area see surface measure
asymptotic density
asymptotic density 273G
asymptotically equidistributed see equidistributed (281Yi)
atom (in a measure space) 211I, 211Xb, 246G
atomic see purely atomic (211K)
convergent General index 549

atomless measure (space) 211J, 211Md, 211Q, 211Xb, 211Yd, 211Ye, 212G, 213He, 214Xe, 215D, 215Xe,
215Xf, 216A, 216Ya, 234Ff, 234Yd, 251T, 251Wo, 251Xq, 251Xs, 251Yc, 252Yp, 252Yr, 252Yt, 254V, 254Yf,
256Xd, 264Yg
automorphism see measure space automorphism
axiom see Banach-Ulam problem, choice (2A1J), countable choice
Coarea Theorem 265 notes
cofinite see finite-cofinite algebra (231Xa)
commutative (ring, algebra) 2A4Jb
commutative Banach algebra 224Yb, 243D, 255Xc, 257Ya
commutative group see abelian group
compact set (in a topological space) 247Xc, 2A2D, 2A2E-2A2G, > 2A3N, 2A3R; see also, relatively
compact (2A3N), relatively weakly compact (2A5Id), weakly compact (2A5Ic)
compact support, function with 242O, 242Pd, 242O, 242Xh, 244H, 244Yi, 256Be, 256D, 256Xh, 262Yd-
262Yg
compact support, measure with 284Yi
compact topological space > 2A3N
complete linear topological space 245Ec, 2A5F, 2A5H
complete locally determined measure (space) 213C, 213D, 213H, 213J, 213L, 213O, 213Xg, 213Xi, 213Xl,
213Yd, 213Ye, 214I, 214Xc, 216D, 216E, 234Fb, 251Yb, 252B, 252D, 252E, 252N, 252Xb, 252Yh, 252Yj,
252Yr-252Yt, 253Yj, 253Yk; see also c.l.d. version (213E)
complete measure (space) 112Df, 113Xa, 122Ya, 211A, 211M, 211N, 211R, 211Xc, 211Xd, 212, 214I,
214J, 216A, 216C, 216Ya, 234A, 234Dd, 254Fd, 254G, 254J, 264Dc; see also complete locally determined
measure
complete metric space 224Ye; see also Banach space (2A4D)
complete normed space see Banach space (2A4D), Banach lattice (242G)
complete Riesz space see Dedekind complete (241Fc)
completed indefinite-integral measure see indefinite-integral measure (234B)
completion (of a measure (space)) 212 (212C), 213Fa, 213Xa, 213Xb, 213Xk, 214Xb, 214Xj, 232Xe,
234Db, 235D, 235Hc, 235Xe, 241Xb, 242Xb, 243Xa, 244Xa, 245Xb, 251S, 251Wn, 251Xp, 252Ya, 254I,
256C
complex-valued function 133
component (in a topological space) 111Ye
conditional expectation 233D, 233E, 233J, 233K, 233Xg, 233Yc, 235Yc, 242J, 246Ea, 253H, 253Le,
275Ba, 275H, 275I, 275K, 275Ne, 275Xi, 275Ya, 275Yk, 275Yl
conditional expectation operator 242Jf, 242K, 242L, 242Xe, 242Yk, 243J, 244M, 244Yk, 246D, 254R,
254Xp, 275Xd, 275Xe
conegligible set 112Dc, 214Cc
connected set 222Yb
continuity, points of 224H, 224Ye, 225J
continuous function 121D, 121Yf, 262Ia, 2A2C, 2A2G, 2A3B, 2A3H, 2A3Nb, 2A3Qb
continuous linear functional 284Yj; see also dual linear space (2A4H)
continuous linear operator 2A4Fc; see also bounded linear operator (2A4F)
continuous see also semi-continuous (225H)
continuum see c (2A1L)
convergence in mean (in L1 () or L1 ()) 245Ib
convergence in measure (in L0 ()) 245 (> >245A), 246J, 246Yc, 247Ya
(in L0 ()) 245 (> >245A), 271Yd, 274Yd, 285Xs
(in the algebra of measurable sets) 232Ya
(of sequences) 245, 246J, 246Xh, 246Xi, 253Xe, 255Yf, 271L, 273Ba, 275Xk, 275Yp, 276Yf
convergent almost everywhere 245C, 245K, 273Ba, 276G, 276H; see also strong law
convergent filter 2A3Q, 2A3S, 2A5Ib;
sequence 135D, 245Yi, 2A3M, 2A3Sg; see also convergence in measure (245Ad)
550 Index convex

convex function 233G, 233H-233J, 233Xb-233Xf (233Xd), 233Xh, 233Ya, 233Yc, 233Yd, 242K, 242Yi,
242Yj, 242Yk, 244Xm, 244Yg, 255Yk, 275Yg; see also mid-convex (233Ya)
convex hull 2A5E; see also closed convex hull (2A5E)
convex set 233Xd, 244Yj, 262Xh, 2A5E
convolution in L0 255Fc, 255Xc, 255Yf, 255Yk
convolution of functions 255E, 255F-255K, 255O, 255Xa-255Xc, 255Xf-255Xj, 255Ya, 255Yb, 255Yd,
255Ye, 255Yi, 255Yl, 255Ym, 255Yn, 262Xj, 262Yd, 262Ye, 262Yh, 263Ya, 282Q, 282Xt, 283M, 283Wd,
283Wf, 283Wj, 283Xl, 284J, 284K, 284O, 284Wf, 284Wi, 284Xb, 284Xd
convolution of measures 257 (257A), 272S, 285R, 285Yn
convolution of measures and functions 257Xe, 284Xo, 284Yi
convolution of sequences 255Xe, 255Yo, 282Xq
countable (set) 111F, 114G, 115G, 1A1, 226Yc
countable choice (axiom of) 134C
countable-cocountable algebra 211R, 211Ya, 232Hb
countable-cocountable measure 211R, 232Hb, 252L
countable sup property (in a Riesz space) 241Yd, 242Yd, 242Ye, 244Yb
countably additive functional (on a -algebra of sets) 231 (231C), 232, 246Yg, 246Yi
counting measure 112Bd, 122Xd, 122 notes, 211N, 211Xa, 213Xb, 226A, 241Xa, 242Xa, 243Xl, 244Xi,
244Xn, 245Xa, 246Xc, 251Xb, 251Xh, 252K, 255Yo, 264Db
cover see measurable envelope (132D)
covering theorem 221A, 261B, 261F, 261Xc, 261Ya, 261Yi, 261Yk
cylinder (in measurable cylinder) 254Aa, 254F, 254G, 254Q, 254Xa
decimal expansions 273Xf
decomposable measure (space) see strictly localizable (211E)
decomposition (of a measure space) 211E, 211Ye, 213O, 213Xh, 214Ia, 214K, 214M, 214Xi; see also Hahn
decomposition of an additive functional (231F), Jordan decomposition of an additive functional (231F),
Lebesgue decomposition of a countably additive functional (232I), Lebesgue decomposition of a function of
bounded variation (226)
decreasing rearrangement (of a function) 252Yp
Dedekind complete partially ordered set 135Ba
Dedekind complete Riesz space 241Fc, 241G, 241Xf, 242H, 242Yc, 243H, 243Xj, 244L
Dedekind -complete Riesz space 241Fb, 241G, 241Xe, 241Yb, 241Yh, 242Yg, 243H, 243Xb
delta function see Diracs delta function (284R)
delta system see -system
dense set in a topological space 242Mb, 242Ob, 242Pd, 242Xi, 243Ib, 244H, 244Ob, 244Yi, 254Xo, 281Yc,
2A3U, 2A4I
density function (of a random variable) 271H, 271I-271K, 271Xc-271Xe, 272T, 272Xd, 272Xj; see also
Radon-Nikodym derivative (232If, 234B)
density point 223B, 223Xi, 223Yb
density topology 223Yb, 223Yc, 223Yd, 261Yf
density see also asymptotic density (273G), Lebesgues Density Theorem (223A)
derivative of a function (of one variable) 222C, 222E, 222F, 222G, 222H, 222I, 222Yd, 225J, 225L, 225Of,
225Xc, 226Be, 282R; (of many variables) 262F, 262G, 262P;see partial derivative
determinant of a matrix 2A6A
determined see locally determined measure space (211H), locally determined negligible sets (213I)
determined by coordinates (in W is determined by coordinates in J) 254M, 254O, 254R-254T, 254Xp,
254Xr
Devils Staircase see Cantor function (134H)
differentiability, points of 222H, 225J
differentiable function (of one variable) 123D, 222A, 224I, 224Kg, 224Yc, 225L, 225Of, 225Xc, 225Xn,
233Xc, 252Yj, 255Xg, 255Xh, 262Xk, 265Xd, 274E, 282L, 282Xs, 283I-283K, 283Xm, 284Xc, 284Xk; (of
many variables) 262Fa, 262Gb, 262I, 262Xg, 262Xi, 262Xj; see also derivative
differentiable relative to its domain 262Fb, 262I, 262M-262Q, 262Xd-262Xf, 262Yc, 263D, 263Xc,
263Xd, 263Yc, 265E, 282Xk
field General index 551

diffused measure see atomless measure (211J)


dilation 286C
dimension see Hausdorff dimension (264 notes)
Diracs delta function 257Xa, 284R, 284Xn, 284Xo, 285H, 285Xp
direct image (of a set under a function or relation) 1A1B
direct sum of measure spaces 214K, 214L, 214Xi-214Xl, 241Xg, 242Xd, 243Xe, 244Xg, 245Yh, 251Xh,
251Xi
directed set see downwards-directed (2A1Ab), upwards-directed (2A1Ab)
Dirichlet kernel 282D; see also modified Dirichlet kernel (282Xc)
disjoint family (of sets) 112Bb
disjoint sequence theorem 246G, 246Ha, 246Yd, 246Ye, 246Yf, 246Yj
distribution see Schwartzian distribution, tempered distribution
distribution of a random variable 241Xc, 271E, 271F, 271Ga, 272G, 272S, 272Xe, 272Yc, 272Yf, 272Yg,
285H, 285Xg: see also Cauchy distribution (285Xm), empirical distribution (273 notes), Poisson distribu-
tion (285Xo)
of a finite family of random variables 271B, 271C, 271D-271G, 272G, 272Ye, 272Yf, 285Ab, 285C,
285Mb
distribution function of a random variable > 271G, 271L, 271Xb, 271Yb, 271Yc, 271Yd, 272Xe, 272Yc,
273Xg, 273Xh, 274F-274L, 274Xd, 274Xg, 274Xh, 274Ya, 274Yc, 285P
Dominated Convergence Theorem see Lebesgues Dominated Convergence Theorem (123C)
Doobs Martingale Convergence Theorem 275G
downwards-directed partially ordered set 2A1Ab
dual linear space (of a normed space) 243G, 244K, 2A4H
Dynkin class 136A, 136B, 136Xb
Eberleins theorem 2A5J
Egorovs theorem 131Ya, 215Yb
empirical distribution 273Xh, 273 notes
envelope see measurable envelope (132D)
equidistributed sequence (in a topological probability space) 281N, 281Yi, 281Yj, 281Yk
Equidistribution Theorem see Weyls Equidistribution Theorem (281N)
equiveridical 121B, 212B
essential supremum of a family of measurable sets 211G, 213K, 215B, 215C; of a real-valued function
243D, 243I
essentially bounded function 243A
Etemadis lemma 272U
Euclidean metric (on R r ) 2A3Fb
Euclidean topology 1A2, 2A2, 2A3Ff, 2A3Tc
even function 255Xb, 283Yb, 283Yc
exchangeable sequence of random variables 276Xe
exhaustion, principle of 215A, 215C, 215Xa, 215Xb, 232E, 246Hc
expectation of a random variable 271Ab, 271E, 271F, 271I, 271Xa, 272Q, 272Xb, 272Xi, 285Ga, 285Xo;
see also conditional expectation (233D)
extended real line 121C, 135
extension of measures 132Yd, 212Xk; see also completion (212C), c.l.d. version (213E)
fair-coin probability 254J
Fatous Lemma 123B, 133K, 135G, 135Hb
Fatou norm on a Riesz space 244Yf
Fejer integral 283Xf, 283Xh-283Xj
Fejer kernel 282D
Fejer sums 282Ad, 282B-282D, 282G-282I, 282Yc
Feller, W. Chap. 27 intro.
field (of sets) see algebra (136E)
552 Index filter

filter 2A1I, 2A1N, 2A1O, 2A5F; see also convergent filter (2A3Q), ultrafilter (2A1N)
finite-cofinite algebra 231Xa, 231Xc
finitely additive functional on an algebra of sets 136Xg, 136Ya, 136Yb, 231A, 231B, 231Xb-231Xe,
231Ya-231Yh, 232A, 232B, 232E, 232G, 232Ya, 232Ye, 232Yg, 243Xk; see also countably additive
Fourier coefficients 282Aa, 282B, 282Cb, 282F, 282I, 282J, 282M, 282Q, 282R, 282Xa, 282Xg, 282Xq,
282Xt, 282Ya, 283Xu, 284Ya, 284Yg
Fouriers integral formula 283Xm
Fourier series 121G, 282 (282Ac)
Fourier sums 282Ab, 282B-282D, 282J, 282L, 282O, 282P, 282Xi-282Xk, 282Xp, 282Xt, 282Yd, 286V,
286Xb
Fourier transform 133Xd, 133Yc, 283 (283A, 283Wa), 284 (284H, 284Wd), 285Ba, 285D, 285Xd,
285Ya, 286U, 286Ya
Fourier-Stieltjes transform see characteristic function (285A)
Frechet filter 2A3Sg
Fubinis theorem 252B, 252C, 252H, 252R
full outer measure 132F, 132G, 132Xk, 132Yd, 133Yf, 134D, 134Yt, 214F, 254Yf
function 1A1B
Fundamental Theorem of Calculus 222E, 222H, 222I, 225E
Fundamental Theorem of Statistics 273Xh, 273 notes
Gamma function see -function (225Xj)
Gaussian distribution see standard normal distribution (274Aa)
Gaussian random variable see normal random variable (274Ad)
generated (-)algebra of sets 111Gb, 111Xe, 111Xf, 121J, 121Xd, 136B, 136C, 136G, 136Xc, 136X, 136Xl,
136Yb
Glivenko-Cantelli theorem 273 notes
group 255Yn, 255Yo; see also circle group
Hahn decomposition of an additive functional 231Eb, 231F
see also Vitali-Hahn-Saks theorem (246Yg)
half-open interval (in R or R r ) 114Aa, 114G, 114Xe, 114Yj, 115Ab, 115Xa, 115Xc, 115Yd
Hardy-Littlewood Maximal Theorem 286A
Hausdorff dimension 264 notes
Hausdorff measure 264 (264C, 264Db, 264K, 264Yo); see also normalized Hausdorff measure (265A)
Hausdorff metric (on a space of closed subsets) 246Yb
Hausdorff outer measure 264 (264A, 264K, 264Yo)
Hausdorff topology 2A3E, 2A3L, 2A3Mb, 2A3S
Hilbert space 244N, 244Yj
Holders inequality 244Eb
hull see convex hull (2A5E), closed convex hull (2A5E)
ideal in an algebra of sets 232Xc; see also -ideal (112Db)
identically distributed random variables 273I, 273Xh, 274 notes, 276Yg, 285Xn, 285Yc; see also exchange-
able sequence (276Xe)
image filter 2A1Ib, 2A3Qb, 2A3S
image measure 112E, 112F, 112Xd, 112Xg, 123Ya, 132G, 132Xk, 132Yb, 132Yf, 211Xd, 212Bd, 212Xg,
235L, 254Oa, 256G
image measure catastrophe 235J
indefinite integral 131Xa, 222D-222F, 222H, 222I, 222Xa-222Xc, 222Yc, 224Xg, 225E, 225Od, 225Xh,
232D, 232E, 232Yf, 232Yi; see also indefinite-integral measure
indefinite-integral measure 234 (234A), 235M, 235P, 235Xi, 253I, 256E, 256J, 256L, 256Xe, 256Yd,
257F, 257Xe, 263Ya, 275Yi, 275Yj, 285Dd, 285Xe, 285Ya; see also uncompleted indefinite-integral measure
independence 272 (272A)
independent random variables 272Ac), 272D-272I, 272L, 272M, 272P-272U, 272Xb, 272Xd, 272Xh-272Xj,
272Ya-272Yd, 272Yf, 272Yg, 273B, 273D, 273E, 273H, 273I, 273L-273N, 273Xh, 273Xi, 273Xk, 274B-274D,
Lebesgue General index 553

274F-274K, 274Xc, 274Xd, 274Xg, 275B, 275Yh, 276Af, 285I, 285Xf, 285Xg, 285Xm-285Xo, 285Xs, 285Yc,
285Yk, 285Yl
independent sets 272Aa, 272Bb, 272F, 272N, 273F, 273K
independent -algebras 272Ab, 272B, 272D, 272F, 272J, 272K, 272M, 272O, 275Ym
induced topology see subspace topology (2A3C)
inductive definitions 2A1B
infinity 112B, 133A, 135
initial ordinal 2A1E, 2A1F, 2A1K
inner measure 113Yh, 212Yc, 213Xe, 213Yc
inner product space 244N, 253Xe
inner regular measure 256A, 256B
with respect to closed sets 256Ya
integrable function 122 (122M), 123Ya, 133B, 133Db, 133Dc, 133F, 133J, 133Xa, 135Fa, 212B, 212F,
213B, 213G; see also Bochner integrable function (253Yf), L1 () (242A)
integral 122 (122E, 122K, 122M); see also integrable function, Lebesgue integral (122Nb), lower
integral (133I), Riemann integral (134K), upper integral (133I)
integration by parts 225F, 225Oe, 252Xi
integration by substitution see change of variable in integration
interior of a set 2A3D, 2A3Ka
interpolation see Riesz Convexity Theorem
interval see half-open interval (114Aa, 115Ab), open interval (111Xb
inverse Fourier transform 283Ab, 283B, 283Wa, 283Xb, 284I; see also Fourier transform
inverse image (of a set under a function or relation) 1A1B
inverse-measure-preserving function 132G, 134Yl, 134Ym, 134Yn, 235G, 235H, 235I, 235Xe, 241Xh,
242Xf, 243Xn, 244Xo, 246Xf, 254G, 254H, 254Ka, 254O, 254Xc-254Xf, 254Xh, 254Yb; see also image
measure (112E)
Inversion Theorem (for Fourier series and transforms) 282G-282I, 282L, 282O, 282P, 283I, 283L, 284C,
284M; see also Carlesons theorem
isodiametric inequality 264H, 264 notes
isomorphism see measure space isomorphism
Jacobian 263Ea
Jensens inequality 233I, 233J, 242Yi
joint distribution see distribution (271C)
Jordan decomposition of an additive functional 231F, 231Ya, 232C
kernelsee Dirichlet kernel (282D), Fejer kernel (282D), modified Dirichlet kernel (282Xc)
Kirzbrauns theorem 262C
Kolmogorovs Strong Law of Large Numbers 273I, 275Yn
Komlos theorem 276H, 276Yh
Kroneckers lemma 273Cb
Lacey-Thiele Lemma 286M
Laplaces central limit theorem 274Xe
Laplace transform 123Xc, 123Yb, 133Xc, 225Xe
lattice 2A1Ad; see also Banach lattice (242G)
norm see Riesz norm (242Xg)
law of a random variable see distribution (271C)
law of large numbers see strong law (273)
law of rare events 285Q
Lebesgue, H. Vol. 1 intro., Chap. 27 intro.
Lebesgue Covering Lemma 2A2Ed
Lebesgue decomposition of a countably additive functional 232I, 232Yb, 232Yg, 256Ib
Lebesgue decomposition of a function of bounded variation 226C, 226Dc, 226Ya, 232Yb
Lebesgues Density Theorem 223, 261C, 275Xg
554 Index Lebesgue

Lebesgues Dominated Convergence Theorem 123C, 133G


Lebesgue extension see completion (212C)
Lebesgue integrable function 122Nb, 122Yb, 122Ye, 122Yf
Lebesgue integral 122Nb
Lebesgue measurable function 121C, 121D, 134Xd, 225H, 233Yd, 262K, 262P, 262Yc
Lebesgue measurable set 114E, 114F, 114G, 114Xe, 114Ye, 115E, 115F, 115G, 115Yc
Lebesgue measure (on R) 114 (114E), 131Xb, 133Xc, 133Xd, 134G-134L, 212Xc, 216A, Chap. 22, 242Xi,
246Yd, 246Ye, 252N, 252O, 252Xf, 252Xg, 252Yj, 252Yp, 255
(on R r ) 115 (115E), 132C, 132Ef, 133Yc, 134, 211M, 212Xd, 245Yj, 251M, 251Wi, 252Q,
252Xh, 252Yu, 254Xk, 255A, 255K, 255L, 255Xd, 255Yc, 255Yd, 256Ha, 256J-256L, 264H, 264I
(on [0, 1], [0, 1[) 211Q, 216A, 252Yq, 254K, 254Xh, 254Xj-254Xl
(on other subsets of R r ) 242O, 244Hb, 244I, 244Yh, 246Yf, 246Yl, 251Q, 252Ym, 255M, 255N,
255O, 255Ye, 255Yh
Lebesgue negligible set 114E, 115E, 134Yk
Lebesgue outer measure 114C, 114D, 114Xc, 114Yd, 115C, 115D, 115Xb, 115Xd, 115Yb, 132C, 134A,
134D, 134Fa
Lebesgue set of a function 223D, 223Xf, 223Xg, 223Xh, 223Yg, 261E, 261Ye
Lebesgue-Stieltjes measure 114Xa, 114Xb, 114Yb, 114Yc, 114Yf, 131Xc, 132Xg, 134Xc, 211Xb, 212Xd,
212Xi, 225Xf, 232Xb, 232Yb, 235Xb, 235Xg, 235Xh, 252Xi, 256Xg, 271Xb, 224Yh
length of a curve 264Yl, 265Xd, 265Ya
length of an interval 114Ab
B.Levis theorem 123A, 123Xa, 133K, 135G, 135Hb, 226E, 242E
Levi property of a normed Riesz space 242Yb, 244Ye
Levys martingale convergence theorem 275I
Levys metric 274Ya, 285Yd
Liapounoffs central limit theorem 274Xg
limit of a filter 2A3Q, 2A3R, 2A3S
limit of a sequence 2A3M, 2A3Sg
limit ordinal 2A1Dd
Lindebergs central limit theorem 274F-274H, 285Ym
Lindebergs condition 274H
linear operator 262Gc, 263A, 265B, 265C, 2A6; see also bounded linear operator (2A4F), continuous
linear operator
linear order see totally ordered set (2A1Ac)
linear space topology see linear topological space (2A5A), weak topology (2A5I), weak* topology (2A5I)
linear subspace (of a normed space) 2A4C; (of a linear topological space) 2A5Ec
linear topological space 245D, 284Ye, 2A5A, 2A5B, 2A5C, 2A5Eb, 2A5F, 2A5G, 2A5H, 2A5I
Lipschitz constant 262A, 262C, 262Yi, 264Yj
Lipschitz function 225Yc, 262A, 262B-262E, 262N, 262Q, 262Xa-262Xc, 262Xh, 262Yi, 263F, 264Yj,
282Yb
local convergence in measure see convergence in measure (245A)
localizable measure (space) 211G, 211L, 211Ya, 211Yb, 212G, 213Hb, 213L-213N, 213Xl, 213Xm, 214Id,
214J, 214Xa, 214Xd, 214Xf, 216C, 216E, 216Ya, 216Yb, 234Fc, 234G, 234Ye, 241G, 241Ya, 243G, 243H,
245Ec, 245Yf, 252Yq, 252Yr, 252Yt, 254U; see also strictly localizable (211E)
locally determined measure (space) 211H, 211L, 211Ya, 216Xb, 216Ya, 216Yb, 251Xc, 252Ya; see also
complete locally determined measure
locally determined negligible sets 213I, 213J-213L, 213Xj-213Xl, 214Ib, 214Xg, 214Xh, 216Yb, 234Yb,
252Yb
locally finite measure 256A, 256C, 256G, 256Xa, 256Ya
locally integrable function 242Xi, 255Xh, 255Xi, 256E, 261Xa, 262Yg
Loeve, M. Chap. 27 intro.
lower integral 133I, 133J, 133Xe, 135H
lower Lebesgue density 223Yf
lower Riemann integral 134Ka
norm General index 555

lower semi-continuous function 225H, 225I, 225Xl, 225Xm, 225Yd, 225Ye


Lusins theorem 134Yc, 256F
Maharam measure (space) see localizable (211G)
Markov time see stopping time (275L)
martingale 275 (275A, 275Cc, 275Cd, 275Ce); see also reverse martingale
martingale convergence theorems 275G-275I, 275K, 275Xf
martingale difference sequence 276A, 276B, 276C, 276E, 276Xd, 276Ya, 276Yb, 276Ye, 276Yg
martingale inequalities 275D, 275F, 275Xb, 275Yc-275Ye, 276Xb
maximal element in a partially ordered set 2A1Ab
maximal theorems 275D, 275Yc, 275Yd, 276Xb, 286A, 286T
mean (of a random variable) see expectation (271Ab)
Mean Ergodic Theoremsee convergence in mean (245Ib)
measurable cover see measurable envelope (132D)
measurable envelope 132D, 132E, 132F, 132Xf, 132Xg, 134Fc, 134Xc, 213K-213M, 214G, 216Yc; see also
full outer measure (132F)
measurable envelope property 213Xl, 214Xl
measurable function (taking values in R) 121 (121C), 122Ya, 212B, 212F, 213Yd, 214La, 214Ma, 235C,
235K, 252O, 252P, 256F, 256Yb, 256Yc
(taking values in R r ) 121Yf), 256G
(taking values in other spaces) 133Da, 133E, 133Yb, 135E, 135Xd, 135Yf
((, T)-measurable function) 121Yb, 235Xc, 251Ya, 251Yc
see also Borel measurable, Lebesgue measurable
measurable set 112A; -measurable set 212Cd; see also relatively measurable (121A)
measurable space 111Bc
measurable transformation 235; see also inverse-measure-preserving function
measure 112A
(in measures E, E is measured by ) > 112Be
measure algebra 211Yb, 211Yc
measure-preserving function see inverse-measure-preserving function (235G), measure space automor-
phism, measure space isomorphism
measure space 112 (112A), 113C, 113Yi
measure space automorphism 255A, 255Ca, 255N, 255Ya, 255 notes
measure space isomorphism 254K, 254Xj-254Xl, 255Ca, 255Mb
metric 2A3F, 2A4Fb; Euclidean metric (2A3Fb), Hausdorff metric (246Yb), Levys metric (274Ya),
pseudometric (2A3F)
metric outer measure 264Xb, 264Yc
metric space 224Ye, 261Yi; see also complete metric space, metrizable space (2A3Ff)
metrizable (topological) space 2A3Ff, 2A3L; see also metric space, separable metrizable space
mid-convex function 233Ya, 233Yd
minimal element in a partially ordered set 2A1Ab
modified Dirichlet kernel 282Xc
modulation 286C
Monotone Class Theorem 136B
Monotone Convergence Theorem see B.Levis theorem (133A)
monotonic function 121D, 222A, 222C, 222Yb, 224D
Monte Carlo integration 273J, 273Ya
multilinear map 253Xc
negligible set 112D, 131Ca, 214Cb; see also Lebesgue negligible (114E, 115E)
non-decreasing sequence of sets 112Ce
non-increasing sequence of sets 112Cf
non-measurable set 134B, 134D, 134Xg
norm 2A4B; (of a linear operator) 2A4F, 2A4G, 2A4I; (of a matrix) 262H, 262Ya; (norm topology)
242Xg, 2A4Bb
556 Index normal

normal density function 274A, 283N, 283We, 283Wf


normal distribution function 274Aa, 274F-274K, 274M, 274Xe, 274Xg
normal random variable 274A, 274B, 285E, 285Xm, 285Xn
normalized Hausdorff measure 264 notes, 265 (265A)
normed algebra 2A4J
normed space 224Yf, 2A4 (2A4Ba); see also Banach space (2A4D)
null set see negligible (112Da)
odd function 255Xb, 283Yd
open interval 111Xb, 114G, 115G, 1A1A, 2A2I
open set (in R r ) 111Gc, 111Yc, 114Yd, 115G, 115Yb, 133Xb, 134Fa, 134Yj, 135Xa, 1A2A, 1A2B,
1A2D, 256Ye, 2A3A, 2A3G; (in R) 111Gc, 111Ye, 114G, 134Xc, 2A2I; see also topology (2A3A)
optional time see stopping time (275L)
order-bounded set (in a partially ordered space) 2A1Ab
order-complete see Dedekind complete (241Ec)
order-continuous norm (on a Riesz space) 242Yc, 242Ye, 244Yd
order*-convergent sequence in a partially ordered set 245Xc; (in L0 ()) 245C, 245K, 245L, 245Xc, 245Xd
order unit (in a Riesz space) 243C
ordered set see partially ordered set (2A1Aa), totally ordered set (2A1Ac), well-ordered set (2A1Ae)
ordinal 2A1C, 2A1D-2A1F, 2A1K
ordinate set 252N, 252Yg, 252Yh
orthogonal matrix 2A6B, 2A6C
orthogonal projection in Hilbert space 244Nb, 244Yj, 244Yk
orthonormal vectors 2A6B
outer measure 113 (113A), 114Xd, 132B, 132Xg, 136Ya, 212Ea, 212Xa, 212Xb, 212Xg, 213C, 213Xa,
213Xg, 213Xk, 213Ya, 251B, 251Wa, 251Xd, 254B, 264B, 264Xa, 264Ya, 264Yo; see also Lebesgue outer
measure (114C, 115C), metric outer measure (264Yc), regular outer measure (132Xa)
defined from a measure 113Ya, 132 (132B), 213C, 213F, 213Xa, 213Xg-213Xj, 213Xk, 213Yd,
214Cd, 215Yc, 251O, 251R, 251Wk, 251Wm, 251Xm, 251Xo, 252Yh, 254G, 254L, 254S, 254Xb, 254Xq,
254Yd, 264F, 264Yd
outer regular measure 256Xi
Parsevals identity 284Qd
partial derivative 123D, 252Yj, 262I, 262J, 262Xh, 262Yb, 262Yc
partial order see partially ordered set (2A1Aa)
partially ordered linear space 241E, 241Yg
partially ordered set 2A1Aa
Peano curve 134Yl-134Yo
periodic extension of a function on ], ] 282Ae
Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O, 284Qd
point-supported measure 112Bd, 112Xg, 211K, 211O, 211Qb, 211Rc, 211Xb, 211Xf, 213Xo, 234Xc,
215Xr, 256Hb
pointwise convergence (topology on a space of functions) 281Yf
pointwise convergent see order*-convergent (245Cb)
pointwise topology see pointwise convergence
Poisson distribution 285Q, 285Xo
Poissons theorem 285Q
polar coordinates 263G, 263Xf
Polyas urn scheme 275Xc
polynomial (on R r ) 252Yu
porous set 223Ye, 261Yg, 262L
positive cone 253G, 253Xi, 253Yd
positive definite function 283Xt, 285Xr
predictable sequence 276Ec
Riesz General index 557

presque partout 112De


primitive product measure 251C, 251E, 251F, 251H, 251K, 251Wa, 251Xa-251Xc, 251Xe, 251Xf, 251Xj,
251Xl-251Xp, 252Yc, 252Yd, 252Yg, 253Ya-253Yc, 253Yg
principal ultrafilter 2A1N
probability density function see density (271H)
probability space 211B, 211L, 211Q, 211Xb, 211Xc, 211Xd, 212G, 213Ha, 215B, 243Xi, 253H, 253Xh,
254, Chap. 27
product measure Chap. 25; see also c.l.d. product measure (251F, 251W), primitive product measure
(251C), product probability measure (254C)
product probability measure 254 (254C), 272G, 272J, 272M, 275J, 275Yi, 275Yj, 281Yk
product topology 281Yc, 2A3T
see also inner product space
pseudometric 2A3F, 2A3G, 2A3H, 2A3I, 2A3J, 2A3K, 2A3L, 2A3Mc, 2A3S, 2A3T, 2A3Ub, 2A5B
pseudo-simple function 122Ye, 133Ye
pull-back measures 132G
purely atomic measure (space) 211K, 211N, 211R, 211Xb, 211Xc, 211Xd, 212G, 213He, 214Xe, 234Xb,
251Xr
purely infinite measurable set 213 notes
push-forward measure see image measure (112F)
quasi-Radon measure (space) 256Ya, 263Ya
quasi-simple function 122Yd, 133Yd
quotient partially ordered linear space 241Yg; see also quotient Riesz space
quotient Riesz space 241Yg, 241Yh, 242Yg
quotient topology 245Ba
Rademachers theorem 262Q
Radon measure(on R or R r ) 256 (256A), 284R, 284Yi; (on ], ]) 257Yb
Radon-Nikodym derivative 232Hf, 232Yj, 234B, 234Ca, 234Yd, 234Yf, 235Xi, 256J, 257F, 257Xe, 257Xf,
272Xe, 272Yc, 275Ya, 275Yi, 285Dd, 285Xe, 285Ya
Radon-Nikodym theorem 232E-232G, 234G, 235Xk, 242I, 244Yk
Radon probability measure (on R or R r ) 271B, 271C, 271Xb, 285Aa, 285M; (on other spaces) 256Ye,
271Ya
Radon product measure (of finitely many spaces) 256K
random variable 271Aa
rapidly decreasing test function 284 (284A, 284Wa), 285Dc, 285Xd, 285Ya
rearrangement see decreasing rearrangement (252Yp)
recursion 2A1B
regular measure see inner regular (256Ac)
regular outer measure 132C, 132Xa, 213C, 214Hb, 251Xm, 254Xb, 264Fb
regular topological space 2A5J
relation 1A1B
relatively compact set 2A3Na, 2A3Ob
relatively measurable set 121A
relatively weakly compact set (in a normed space) 247C, 2A5I
repeated integral 252 (252A); see also Fubinis theorem, Tonellis theorem
reverse martingale 275K
Riemann integrable function 134K, 134L, 281Yh, 281Yi
Riemann integral 134K, 242 notes
Riemann-Lebesgue lemma 282E
Riesz Convexity Theorem 244 notes
Riesz norm 242Xg
Riesz space (= vector lattice) 231Yc, 241Ed, 241F, 241Yc, 241Yg
558 Index Saks

Saks see Vitali-Hahn-Saks theorem (246Yg)


saltus function 226B, 226Db, 226Xa
Schroder-Bernstein theorem 2A1G
Schwartz function see rapidly decreasing test function (284A)
Schwartzian distribution 284R, 284 notes; see also tempered distribution (284 notes)
self-supporting set (in a topological measure space) 256Xf
semi-continuous function see lower semi-continuous (225H)
semi-finite measure (space) 211F, 211L, 211Xf, 211Ya, 212G, 213A, 213B, 213Hc, 213Xc, 213Xd, 213Xj,
213Xl, 213Xm, 213Ya-213Yc 214Xe, 214Xh, 215B, 216Xa, 216Yb, 234Fa, 235O, 235Xd, 235Xe, 241G, 241Ya,
241Yd, 243G, 245Ea, 245J, 245Xd, 245Xj, 245Xl, 246J, 246Xh, 251J, 251Xc, 252P, 252Yf, 253Xf, 253Xg
semi-finite version of a measure 213Xc, 213Xd
semi-martingale see submartingale (275Yf)
seminorm 2A5D
semi-ring of sets 115Ye
separable (topological) space 2A3Ud
separable Banach space 244I, 254Yc
separable metrizable space 245Yj, 264Yb, 284Ye
shift operators (on function spaces based on topological groups) 286C
Sierpinski Class Theorem see Monotone Class Theorem (136B)
signed measure see countably additive functional (231C)
simple function 122 (> >122A), 242M
singular additive functional 232Ac, 232I, 232Yg
smooth function (on R or R r ) 242Xi, 255Xi, 262Yd-262Yg, 284A, 284Wa
smoothing by convolution 261Ye
solid hull (of a subset of a Riesz space) 247Xa
space-filling curve 134Yl
sphere, surface measure on 265F-265H, 265Xa-265Xc, 265Xe
spherical polar coordinates 263Xf, 265F
square-integrable function 244Na; see also L2
standard normal distribution, standard normal random variable 274A
Steiner symmetrization 264H, 264 notes
step-function 226Xa
Stieltjes measure see Lebesgue-Stieltjes measure (114Xa)
Stirlings formula 252Yn
stochastically independent see independent (272A)
Stone-Weierstrass theorem 281A, 281E, 281G, 281Ya, 281Yg
stopping time 275L, 275M-275O, 275Xi, 275Xj
strictly localizable measure (space) 211E, 211L, 211N, 211Xf, 211Ye, 212G, 213Ha, 213J, 213L, 213O,
213Xa, 213Xh, 213Xn, 213Ye, 214Ia, 214J, 215Xf, 216E, 234Fd, 235P, 251N, 251P, 251Xn, 252B, 252D,
252E, 252Ys, 252Yt
strong law of large numbers 273D, 273H, 273I, 273Xh, 275Yn, 276C, 276F, 276Ye, 276Yg
subalgebra see -subalgebra (233A)
submartingale 275Yf, 275Yg
subspace measure 113Yb, 214A, 214B, 214C, 214H, 214I, 214Xb-214Xh, 216Xa, 216Xb, 241Ye, 242Yf,
243Ya, 244Yc, 245Yb, 251P, 251Q, 251Wl, 251Xn, 251Yb, 254La, 254Ye, 264Yf; (on a measurable subset)
131A, 131B, 131C, 132Xb, 214J, 214K, 214Xa, 214Xi, 241Yf, 247A; (integration with respect to a subspace
measure) 131D, 131E-131H, 131Xa-131Xc, 133Dc, 133Xa, 214D, 214E-214G, 214M
subspace of a normed space 2A4C
subspace topology 2A3C, 2A3J
subspace -algebra 121A, 214Ce
substitution see change of variable in integration
successor cardinal 2A1Fc
ordinal 2A1Dd
usual General index 559

sum over arbitrary index set 112Bd, 226A


sum of measures 112Xe, 112Ya, 212Xe, 212Xh, 212Xi, 212Xj, 212Yd, 212Ye
summable family of real numbers 226A, 226Xf
support of a topological measure 256Xf, 257Xd
support see also bounded support, compact support
supported see point-supported (112Bd)
supporting see self-supporting set (256Xf), support
supremum 2A1Ab
surface measure see normalized Hausdorff measure (265A)
symmetric distribution 272Ye
symmetrization see Steiner symmetrization
tempered distribution 284 notes
tempered function 284 (284D), 286D
tempered measure 284Yi
tensor product of linear spaces 253 notes
test function 242Xi, 284 notes; see also rapidly decreasing test function (284A)
thick set see full outer measure (132F)
tight see uniformly tight (285Xj)
Tonellis theorem 252G, 252H, 252R
topological measure (space) 256A
topological space 2A3 (2A3A)
topological vector space see linear topological space (2A5A)
topology 2A2, 2A3 (2A3A); see also convergence in measure (245A), linear space topology (2A5A)
total order see totally ordered set (2A1Ac)
total variation (of an additive functional) 231Yh; (of a function) see variation (224A)
totally finite measure (space) 211C, 211L, 211Xb, 211Xc, 211Xd, 212G, 213Ha, 214Ia, 214Ja, 215Yc,
232Bd, 232G, 243I, 243Xk, 245Fd, 245Xe, 245Ye, 246Xi, 246Ya
totally ordered set 135Ba, 2A1Ac
trace (of a -algebra) see subspace -algebra (121A)
transfinite recursion 2A1B
translation-invariant measure 114Xf, 115Xd, 134A, 134Ye, 134Yf, 255A, 255Ba, 255Yn
truly continuous additive functional 232Ab, 232B-232E, 232H, 232I, 232Xa, 232Xb, 232Xf, 232Xh, 232Ya,
232Ye, 234Ce
Ulam S. see Banach-Ulam problem
ultrafilter 254Yd, 2A1N, 2A1O, 2A3R, 2A3Se; see also principal ultrafilter (2A1N)
Ultrafilter Theorem 2A1O
uncompleted indefinite-integral measure 234Cc
uniform space 2A5F
uniformly continuous function 224Xa, 255K
uniformly distributed sequence see equidistributed (281Yi)
uniformly integrable set (in L1 ) 246 (>
>246A), 252Yp, 272Yd, 273Na, 274J, 275H, 275Xi, 275Yl, 276Xd,
276Yb; (in L1 ()) 246 (246A), 247C, 247D, 247Xe, 253Xd
uniformly tight (set of measures) 285Xj, 285Xk, 285Ye, 285Yf
unit ball in R r 252Q
universal mapping theorems 253F, 254G
up-crossing 275E, 275F
upper integral 133I, 133J, 133K, 133Xe, 133Yf, 135H, 252Ye, 252Yh, 252Yi, 253J, 253K
upper Riemann integral 134Ka
upwards-directed partially ordered set 2A1Ab
usual measure on {0, 1}I 254J; see under {0, 1}I
usual measure on PX 254J; see under PX
560 Index vague

vague topology (on a space of signed measures) 274Ld, 274Xh, 274Ya-274Yd, 275Yp, 285K, 285L, 285S,
285U, 285Xk, 285Xq, 285Xs, 285Yd, 285Yg-285Yi, 285Yn
variance of a random variable 271Ac, 271Xa, 272R, 272Xf, 285Gb, 285Xo
variation of a function 224 (224A, 224K, 224Yd, 224Ye), 226B, 226Db, 226Xc, 226Xd, 226Yb; see
also bounded variation (224A)
of a measure see total variation (231Yh)
vector integration see Bochner integral (253Yf)
vector lattice see Riesz space (241E)
virtually measurable function 122Q, 122Xe, 122Xf, 212Bb, 212F, 241A, 252E
Vitali cover 261Ya
Vitalis theorem 221A, 221Ya, 221Yc, 221Yd, 261B, 261Yk
Vitali-Hahn-Saks theorem 246Yg
volume 115Ac
of a ball in R r 252Q, 252Xh
Walds equation 272Xh
weak topology (of a normed space) 247Ya, 2A5I
see also (relatively) weakly compact, weakly convergent
weak* topology on a dual space 253Yd, 285Yg, 2A5Ig; see also vague topology (274Ld)
weakly compact set (in a linear topological space) 247C, 247Xa, 247Xc, 247Xd, 2A5I; see also relatively
weakly compact (2A5Id)
weakly convergent sequence in a normed space 247Yb
Weierstrass approximation theorem 281F; see also Stone-Weierstrass theorem
well-distributed sequence 281Xh
well-ordered set 2A1Ae, 2A1B, 2A1Dg, 2A1Ka; see also ordinal (2A1C)
Well-ordering Theorem 2A1Ka
Weyls Equidistribution Theorem 281M, 281N, 281Xh
Youngs inequality 255Ym
Zermelos Well-ordering Theorem 2A1Ka
zero-one law 254S, 272O, 272Xf, 272Xg
Zorns lemma 2A1M
a.e. (almost everywhere) 112Dd
a.s. (almost surely) 112De
B (in B(x, ), closed ball) 261A, 2A2B
B (in B(U ; V ), space of bounded linear operators) 253Xb, 253Yj, 253Yk, 2A4F, 2A4G, 2A4H
c (the cardinal of R or PN) 2A1H, 2A1L
C (in C(X), where X is a topological space) 243Xo, 281Yc, 281Ye, 281Yf
C([0, 1]) 242 notes
Cb (in Cb (X), where X is a topological space) 281A, 281E, 281G, 281Ya, 281Yd, 281Yg, 285Yg
c.l.d. product measure 251-253 (251F, 251W), 254Db, 254U, 254Ye, 256K, 256L
c.l.d. version of a measure (space) 213E, 213F-213H, 213M, 213Xb-213Xe, 213Xg, 213Xj, 213Xk, 213Xn,
213Xo, 213Yb, 214Xf, 214Xj, 232Ye, 234Yf, 241Ya, 242Yh, 244Ya, 245Yc, 251Ic, 251S, 251Wn, 251Xd,
251Xj, 251Xk, 252Ya
diam (in diam A) = diameter
dom (in dom f ): the domain of a function f
ess sup see essential supremum (243Da)
E (in E(X), expectation of a random variable) 271Ab
f -algebra 241H, 241 notes
G set 264Xe
S General index 561

`1 (in `1 (X)) 242Xa, 243Xl, 246Xd, 247Xc, 247Xd


`1 (= `1 (N)) 246Xc
`2 244Xn, 282K, 282Xg
`p (in `p (X)) 244Xn
` (in ` (X)) 243Xl, 281B, 281D
` (= ` (N)) 243Xl
L0 (in L0 ()) 121Xb, 121Ye, 241 (241A), 245, 253C, 253Ya; see also L0 (241C), L0strict (241Yh), L0C
(241J)
L0strict 241Yh
L0C (in L0C ()) 241J, 253L
L0 (in L0 ()) 241 (241A), 242B, 242J, 243A, 243B, 243D, 243Xe, 243Xj, 245, 253Xe, 253Xf, 253Xg,
271De, 272H; (in L0C ()) 241J; see also L0 (241A)
L1 (in L1 ()) 122Xc, 242A, 242Da, 242Pa, 242Xb; (in L1strict () 242Yg; (in L1C () 242P, 255Yn; (in
LV ()) 253Yf; see also L1 , k k1
1

L1 (in L1 ()) 242 (242A), 243De, 243F, 243G, 243J, 243Xf, 243Xg, 243Xh, 245H, 245J, 245Xh, 245Xi,
246, 247, 253, 254R, 254Xp, 254Ya, 254Yc, 255Xc, 257Ya, 282Bd; (in L1V ()) 253Yf, 253Yi; see also L1 ,
L1C , k k1
L1C () 242P, 243K, 246K, 246Yl, 247E, 255Xc; see also convolution of functions
L2 (in L2 ()) 244Ob, 253Yj, 286; (in L2C ()) 284N, 284O, 284Wh, 284Wi, 284Xi, 284Xk-284Xm, 284Yg;
see also L2 , Lp , k k2
L2 (in L2 ()) 244N, 244Yk, 247Xe, 253Xe; (in L2C ()) 282K, 282Xg, 284P; see also L2 , Lp , k k2
Lp (in Lp ()) 244 (244A), 246Xg, 252Ym, 253Xh, 255K, 255Og, 255Yc, 255Yd, 255Yl, 255Ym, 261Xa,
263Xa, 273M, 273Nb, 281Xd, 282Yc, 284Xj, 286A; see also Lp , L2 , k kp
Lp (in Lp (), 1 < p < ) 244 (244A), 245G, 245Xj, 245Xk, 245Yg, 246Xh, 247Ya, 253Xe, 253Xi,
253Yk, 255Yf; see also Lp , k kp
L (in L ()) 243A, 243D, 243I, 243Xa, 243Xl, 243Xn; see also L
LC 243K
Lstrict 243Xb
L (in L ()) 243 (243A), 253Yd; see also L , L C , k k
LC 243K, 243Xm
L (in L(U ; V ), space of linear operators) 253A, 253Xa
lim (in lim F) 2A3S; (in limxF ) 2A3S
lim inf (in lim inf n ) 1A3 (1A3Aa), 2A3Sg; (in lim inf 0 ) 2A2H; (in lim inf xF ) 2A3S
lim sup (in lim supn ) 1A3 (1A3Aa), 2A3Sg; (in lim sup0 ) 2A2H, 2A3Sg; (in lim supxF f (x))
2A3S
ln+ 275Yd
M0, 252Yp
M 1, (in M 1, ()) 234Yd, 244Xl, 244Xm, 244Xo, 244Yc
N see PN
N N 111Fb
P(usual measure on PX) 254J, 254Xf, 254Xq, 254Yd
PN 1A1Hb, 2A1Ha, 2A1Lb; (usual measure on) 273G, 273Xd, 273Xe
p.p. (presque partout) 112De
Pr(X > a), Pr(X X E) etc. 271Ad
Q (the set of rational numbers) 111Eb, 1A1Ef
R (the set of real numbers) 111Fe, 1A1Ha, 2A1Ha, 2A1Lb
RX 245Xa, 256Ye; see also Euclidean metric, Euclidean topology
R
C 2A4A
R see extended real line (135)
S (in S(A)) 243I; (in S f
= S(Af )) 242M, 244H
S see rapidly decreasing test function (284A)
562 Index S

S 1 (the unit circle, as topological group) see circle group


S r1 (the unit sphere in R r ) see sphere

sf (in sf ) see semi-finite version of a measure (213Xc); (in sf ) 213Xf, 213Xg, 213Xk

T2 topology see Hausdorff (2A3E)


T (in T, ) 244Xm, 244Xo, 244Yc, 246Yc
Tm see convergence in measure (245A)
Ts (in Ts (U, V )) see weak topology (2A5I), weak* topology (2A5Ig)
U (in U (x, )) 1A2A
Var (in Var(X)) see variance (271Ac); (in VarD f , Var f ) see variation (224A)
w -topology see weak* topology 2A5Ig
Z (the set of integers) 111Eb, 1A1Ee; (as topological group) 255Xe
ZFC see Zermelo-Fraenkel set theory

r (volume of unit ball in R r ) 252Q, 252Xh, 265F, 265H, 265Xa, 265Xb, 265Xe
-function 225Xj, 225Xk, 252Xh, 252Yk, 252Yn, 255Xj
-system 2A1Pa
-refinable see hereditarily weakly -refinable

G (standard normal distribution) 274Aa


X see distribution of a random variable (271C)
- Theorem see Monotone Class Theorem (136B)
-additive see countably additive (231C)
-algebra of sets 111 (111A), 136Xb, 136Xi, 212Xk; see also Borel -algebra (111G)
-algebra defined by a random variable 272C, 272D
-complete see Dedekind -complete (241Fb)
-field see -algebra (111A)
-finite measure (space) 211D, 211L, 211M, 211Xe, 212G, 213Ha, 213Ma, 214Ia, 214Ja, 215B, 215C,
215Xe, 215Ya, 215Yb, 216A, 232B, 232F, 234Fe, 235O, 235R, 235Xe, 235Xk, 241Yd, 243Xi, 245Eb, 245K,
245L, 245Xe, 251K, 251Wg, 252B-252E, 252H, 252P, 252R, 252Xc, 252Yb, 252Yl
-ideal (of sets) 112Db, 211Xc, 212Xf, 212Xk
-subalgebra
P of sets 233 (233A)
iI ia 112Bd, 222Ba, 226A
-additive measure 256M, 256Xb, 256Xc
see normal distribution function (274Aa)
(in A, where A is a set) 122Aa
(the first infinite ordinal) 2A1Fa
1 (the first uncountable ordinal) 2A1Fc
see also P1
2 2A1Fc

see closure (2A2A, 2A3Db)


(in h(u), where h is a Borel function and u L0 ) 241I, 241Xd, 241Xi, 245Dd
=a.e. 112Dg, 112Xh, 222E, 241C
a.e. 112Dg, 112Xh, 212B
a.e. 112Dg, 112Xh, 233I
0
(in T 0 ) see adjoint operator
special symbols General index 563

(in f g, u v, , f , f ) see convolution (255E, 255O, 255Xe, 255Yn)


* (in weak*) see weak* topology (2A5Ig); (in U = B(U ; R), linear topological space dual) see dual
(2A4H); (in ) see outer measure defined by a measure (132B)
(in ) see inner measure defined by a measure (113Yh)
\ (in E \ F , set difference) 111C
4
S (in S E4F , symmetric difference)
S 111C
T (in TnN E n ) 111C; (in T A) 1A1F
R (in R nNREn ) 111C; R (in E) 1A2F R
(in f , f d, Rf (x)(dx)) 122E, 122K, 122M, R 122Nb; (in A f ) 131D, 214D, 235Xf; see also
subspace measure; (in u) 242Ab, 242B, 242D; (in A u) 242Ac; see also upper integral, lower integral
(133I)
R
R see upper integral (133I)
see lower integral (133I)
R
R see Riemann integral (134K)
(in f A, the restriction of a function to a set) 121Eh
| | (in a Riesz space) 241Ee, 242G
k k1 (on L1 ()) 242 (242D), 246F, 253E, 275Xd, 282Ye; (on L1 ()) 242D, 242Yg, 273Na, 273Xi
k k2 244Da, 273Xj; see also L2 , k kp
k kp (for 1 < p < ) 244 (244Da), 245Xm, 246Xb, 246Xh, 246Xi, 252Ym, 252Yp, 253Xe, 253Xh, 273M,
273Nb, 275Xe, 275Xf, 275Xh, 276Ya; see also Lp , Lp
k k 243D, 243Xb, 243Xo, 244Xh, 273Xk, 281B; see also essential supremum (243D), L , L , `
(in f g) 253B, 253C, 253J, 253L, 253Ya, 253Yb; (in u v) 253 (253E)
b (in T)
b 251D, 251K, 251L, 251Xk, 251Ya, 252P, 252Xd, 252Xe, 253C
N
c (in N c i ) 251Wb, 251Wf, 254E, 254F, 254Mc, 254Xc, 254Xi
Q Q iI Q
(in iI i ) 254F; (in iI Xi ) 254Aa
# (in #(X), the cardinal of X) 2A1Kb
+
(in + , successor cardinal) 2A1Fc; (in f + , where f is a function) 121Xa, 241Ef; (in u+ , where u
belongs to a Riesz space) 241Ef; (in F (x+ ), where F is a real function) 226Bb

(in f , where f is a function) 121Xa, 241Ef; (in u , in a Riesz space) 241Ef; (in F (x ), where F is
a real function) 226Bb
, (in a lattice) 121Xa, 2A1Ad


, (in f , f ) see Fourier transform, inverse Fourier transform (283A)
{0, 1}I (usual measure on) 254J, 254Xd, 254Xe, 254Yc, 272N, 273Xb; (when I = N) 254K, 254Xj, 254Xq,
256Xk, 261Yd; see also PX
(in ) see absolutely continuous (232Aa)
see infinity
[ ] (in [a, b]) see closed interval (115G, 1A1A, 2A1Ab); (in f [A], f 1 [B], R[A], R1 [B]) 1A1B
[[ ]] (in f [[F]]) see image filter (2A1Ib)
[ [ (in [a, b[) see half-open interval (115Ab, 1A1A)
] ] (in ]a, b]) see half-open interval (1A1A)
] [ (in ]a, b[) see open interval (115G, 1A1A)
(in E) 234E, 235Xf

You might also like