You are on page 1of 37

NOTES ON SET THEORY (18.

510, FALL 2011)


HENRY COHN
1. First axioms of set theory
In set theory, everything is a set. In other words, there are no atomic elements,
contained in sets but not sets themselves. Instead, all mathematical objects are
actually built entirely out of sets. This is a little counterintuitive, because it means
questions like Is 2 ? are actually meaningful. (The answer may depend on
exactly how one constructs the real numbers, for example by Cauchy sequences
or Dedekind cuts.) In everyday mathematics, exotic questions like this are of no
interest and it is best to ignore them; in fact, merely asking whether 2 suggests
a fundamental misunderstanding of what real numbers are. In set theory, we put
up with these sorts of issues so that we can build mathematics on the simplest
foundation. Ultimately, we will see in these notes that all mathematical objects will
be built recursively as sets, starting just with the empty set.
We will use two basic, undened binary relations, namely = and . Everything
else must be dened in terms of them, together with logical connectives (and, or,
not, implies, is equivalent), quantiers (there exists, for all), and variables. For
example, we dene A B to mean that every element of A is an element of B.
In symbols, x((x A) (x B)). Note that quantiers quantify over all sets.
Sometimes we write x S as shorthand, but there is no need to restrict the values
we are quantifying over to a given set.
Later in the course we will develop the logical formalism more explicitly. For
now, well treat it somewhat informally, but still carefully. In particular, we cannot
make use of any intuitive ideas that havent been precisely dened. For example,
we all know what it means for a set to be nite, and that for example the union
of two nite sets is always nite, but we cannot use these intuitions until we have
rigorously justied them. One rule of thumb is that any time you need to write an
ellipsis, something unrigorous is happening.
Because everything in set theory is a set, all variables will stand for sets. The
sort of letter used is sometimes meant to be suggestive. For example, when one set
is an element of another, it is sometimes convenient to use a lowercase letter for the
element and an uppercase letter for the set it is in. However, there is no way to do
this consistently, and these conventions play no role in the logic.
The theory well develop is called ZFC: Zermelo-Fraenkel set theory with the
Axiom of Choice. (ZF is the same theory without the Axiom of Choice.) ZFC is
the consensus foundation for modern mathematics, and almost everything mathe-
maticians do can be done in this framework. Well see later in the course that ZFC
is far from complete, and that additional axioms are sometimes useful, but ZFC is
surprisingly comprehensive in practice.
The most basic question is when two sets are the same.
1
2 HENRY COHN
Axiom (Extensionality). Two sets are equal if and only if they have the same
elements. In symbols, S T ((S = T) x((x S) (x T))).
Well frequently use words instead of symbols, because symbolic expressions
are often more cumbersome yet arent actually more precise, once you are used to
translating between words and symbols.
Intuitively, the Axiom of Extensionality means that elements of a set do not have
associated multiplicities or degrees of membership, do not come in any sort of order,
etc. The only thing that matters is whether each potential element is in or out of
the set. Note also that extensionality justies the common technique of proving
that S = T by showing that S T and T S.
It is common in mathematics to equate a property with the set of all things
having that property. For example, consider the following informal statement of
mathematical induction:
Given any property P of natural numbers, if 0 has property P, and
if whenever n has property P, so does n + 1, then every natural
number has property P.
This statement can be made precise as follows:
Given any set S of natural numbers, if 0 S, and if whenever n S,
it follows that n + 1 S, then every natural number is in S.
However, one must be slightly careful about language when one equates properties
with sets. In ordinary language, properties do not satisfy the extensionality; it is
possible to have two dierent properties satised by exactly the same things. For
example, being the day after Monday is a dierent property than being the day
before Wednesday, even though both properties uniquely dene Tuesday.
This explains the name extensionality. The extent or extension of a property is
the set of things that satisfy it, while the intent or intension is the idea it expresses.
(This is correctly spelled, and its subtly dierent from intention, but that is close
enough for our purposes.) The Axiom of Extensionality says that when we dene a
set, only the extent matters, not the intent.
The next question is how we can make sets. Theres an obvious guess, namely
we that we can make them however we want:
False Axiom (Comprehension). For every mathematical property (x), there exists
a set S such that for all x, we have x S i (x) holds.
The set S is unique, by the Axiom of Extensionality. We write
S = x : (x).
Note that by a property (x), we mean any syntactically correct statement about x
in our formal language. Well be more explicit about exactly what this means later
in the course.
Comprehension is labeled false axiom above because it is not part of ZFC. It
cannot be, because it leads immediately to Russells paradox: let
R = x : x , x.
Then R R if and only if R , R, which is a contradiction.
One of the diculties of set theory is that it is sometimes not obvious whether an
attempted denition really does dene a set. For example, the denition of Russells
set R looks innocent enough: sure, x , x looks a little odd, but its a perfectly
NOTES ON SET THEORY (18.510, FALL 2011) 3
reasonable property for a set to have. Indeed, we dont expect any set to be an
element of itself. It is therefore a little unnerving that the set R cannot exist.
However, there is no reason to be worried about this. Merely proposing a denition
of something does not ensure its existence. From this perspective, Russells paradox
is no worse than saying Let n be the greatest integer. Then n+1 is an even greater
integer, which is a paradox. This paradox is resolved by saying there is no greatest
integer, and Russells paradox is resolved by saying x : x , x is not actually a set.
Instead of comprehension, set theory uses the weaker Axiom of Separation. The
Axiom of Separation says that once we have a set, we can dene a subset by any
property we want:
Axiom (Separation). For every mathematical property (x) and every set S,
T = x S : (x)
is a set. In other words, there exists a set T such that for all x, we have x T if
and only if x S and (x) holds.
As above, there is a unique such set T by extensionality. The name separation
refers to separating the elements x S that satisfy (x) from those that do not. By
contrast, comprehension refers to creating a comprehensive set of all x satisfying
(x). The former is always possible, but the later is sometimes impossible.
Denition 1.1. A universe, or universal set, is a set that contains every set as an
element.
By extensionality, there can be only one universe, if any, but it is not clear that
there must be one. In fact, there cannot be:
Proposition 1.2. There is no universe.
Proof. Suppose U were a universal set. By the Axiom of Separation, we can dene
R = x U : x , x.
Because U is universal, R U, and we conclude that R R if and only if R , R.
This is a contradiction, so U cannot exist.
Note that the non-existence of the universe is simply Russells paradox, with
the paradoxical aspect eliminated. Every apparent paradox is really a theorem in
disguise, if one can identify the problematic assumption being made.
If there were a universe U, comprehension would be a special case of separation,
because x : (x) = x U : (x). Thus, although Proposition 1.2 may seem
surprising, it is fundamental to set theory.
The basic diculty is that U is too big to be a set. This phenomenon, called
limitation of size, is the fundamental reason why attempted denitions can fail to
dene actual sets.
Its possible to work in a bigger theory, which includes not just sets but also
objects called classes. These are set-like objects, but possibly much bigger, and
a set is simply a class that can be an element of another class. Then there is a
universe, i.e., a class consisting of all sets, but it is a proper class, not a set itself.
This language is sometimes convenient, but never necessary, and we will not use it
in our formal development of set theory.
So far, there is a worrisome gap in our axioms: are there any sets at all?
Extensionality tells when two sets are equal, but it doesnt guarantee that any sets
exist, and separation only lets us make further sets from a starting set.
4 HENRY COHN
Axiom (Empty Set). There is a set with no elements.
The empty set is unique, by extensionality, and we denote it (or sometimes ).
Note that we could replace the Empty Set Axiom by an axiom simply asserting the
existence of some unspecied set, since given any set S, we can dene = x S :
x ,= x.
Now we know there is at least one set, but our axioms so far allow the possibility
that the empty set is the only set.
Axiom (Pairing). For all sets a and b, there exists a set S such that a S and
b S.
Given such a set, by separation there is a set x S : x = a or x = b, which is
unique by extensionality. We write a, b for that set. Note that we allow a = b, in
which case we can also write a.
From , we can now form a new set . It is a dierent set: has no elements,
while has one element, namely . We can go on the form further sets, such as
, or . However, using the axioms so far we cannot form a set with more
than two elements.
Axiom (Union). For every set S, there is a set whose elements are exactly the
elements of the elements of S.
This set is unique by extensionality, and we denote it by
_
xS
x.
It consists of the union of all the elements of S: for all sets y, we have y

xS
x
if and only if y x for some x S.
By combining the Union Axiom with the Pairing Axiom, we see that every pair of
sets has a union, but of course the Union Axiom is much broader than this restricted
statement. We write a b for the union of a, b. It also follows from combining
pairing and union that we can form three-element sets (a, b, c = a, b c),
four-element sets, etc. At this point, we cannot state a general theoremwe havent
even dened what a nite set or a natural number isbut we can state specic
theorems, one for three elements, one for four, etc., and its clear that we can
continue this process indenitely.
Note that we do not need an intersection axiom. If S is a non-empty set, let x
0
be an element of S. Then

xS
x = y x
0
: y x for all x S,
which denes a set by separation. The restriction that S must be non-empty is
necessary, because the intersection of the elements of the empty set should be a
universal set, which doesnt exist.
Denition 1.3. The power set T(S) of a set S is the set of all subsets of S.
The power set is unique if it exists, by extensionality, but it is not clear that
every set has a power set. That doesnt follow from the axioms so far, because they
are all satised by the class of countable sets, but the power set of a countable set
is not countable.
Axiom (Power Set). Every set has a power set.
NOTES ON SET THEORY (18.510, FALL 2011) 5
Even with the Power Set Axiom, our list of axioms is far from complete. So far,
they all hold for the class of nite sets, so we have no guarantee that there are
any innite sets. However, we have enough axioms to start building up some of
mathematical practice.
The key tool we need is the ordered pair. We will use Kuratowskis denition:
Denition 1.4. The ordered pair (x, y) is dened to be x, x, y.
This denition is completely ad hoc, and nobody actually thinks the ordered
pair (x, y) means anything like x, x, y intuitively. The only purpose of this
denition is to oer a purely set-theoretic construction that makes the following
lemma true.
Lemma 1.5. For all u, v, x, y, we have (u, v) = (x, y) if and only if u = x and
v = y.
Proof. We just need to show how to recover x and y from (x, y). We can characterize
x as the unique element in the intersection of the elements of (x, y) (namely, x
and x, y). Then there are two possibilities: if (x, y) = x, then y = x, and
otherwise y is the one set that is an element of an element of (x, y) but isnt equal
to x.
There are certainly many other ways to dene ordered pairs for which this lemma
would still be true. However, there are some subtleties. For example, one could try
to use the denition x, x, y. In fact, given an additional axiom we will introduce
later (the Axiom of Foundation), that denition can be proved to work, but one
must deal with the following issue. If u = x, y and x = u, v, then
x, x, y = u, u, v.
This circular relationship between u and x may seem odd, and in fact it will be
ruled out by the Axiom of Foundation, but nothing we have seen so far prohibits it.
Kuratowskis denition avoids this issue.
Once we have dened ordered pairs, we can dene the Cartesian product S T.
Specically,
S T = (s, t) T(T(S T)) : s S, t T.
Note that in order to apply separation, we must have a set that is guaranteed to
contain every ordered pair (s, t) with s S and t T, but fortunately T(T(S T))
can play this role.
Denition 1.6. A relation between S and T is a subset of S T. A function
f : S T is a subset f S T such that for all s S, there exists a unique t T
with (s, t) f.
Of course, if f is a function, we write f(s) = t to mean (s, t) f.
We will write f[U] to denote the image of the set U S under a function
f : S T, i.e., the set t T : t = f(s) for some s U. Often the notation
f(U) is used instead, but that can be ambiguous. For example, if f has domain
x, y, x, y, then f(x, y) is very dierent from f[x, y] (which is f(x), f(y)).
This may seem like an arcane scenario, but nested sets like this come up frequently
in the study of ordinals.
We call S the domain of a function from S to T. Given such a function f, its
restriction f[
S
to a subset S

of S is simply
(s, t) f : s S

.
6 HENRY COHN
Denition 1.7. A function f : S T is injective (or one-to-one, or an injection)
if f(s
1
) = f(s
2
) implies s
1
= s
2
for all s
1
, s
1
S. It is surjective (or onto, or a
surjection) if for every t T, there exists an s S such that f(s) = t; equivalently,
it is surjective if f[S] = T. It is bijective (or a one-to-one correspondence, or a
bijection) if it is both injective and surjective.
2. The natural numbers
One crucial step in building mathematics within set theory is constructing the
natural numbers. After that, the rest of mathematics (Z, Q, R, C, and beyond) can
be built by the usual approach from algebra and analysis classes. However, N has
to come from somewhere.
As with dening ordered pairs, there is no canonical way to do this, but we will
follow a particularly simple and beautiful approach due to von Neumann. Each
natural number n will be a specic set with n elements, namely the set of all the
previous natural numbers.
Thus, we dene 0 = , 1 = 0, 2 = 0, 1, 3 = 0, 1, 2, etc. Of course, the
etc. is hiding something, and we will need to carry this out more formally, but it
captures the intuition.
If we follow this approach, then
n + 1 = 0, 1, . . . , n = 0, 1, . . . , n 1 n = n n.
(Of course, this discussion is still informal, since the ellipses are appealing to
intuition.) We call this construction the successor:
Denition 2.1. The successor of a set x is the set x
+
= xx. A set is inductive
if it contains and it contains x
+
whenever it contains x.
The natural numbers should form an inductive set, and the Axiom of Innity
guarantees that such a set exists. The reason for the name Axiom of Innity is
that the class of nite sets does not satisfy this axiom.
Axiom (Innity). There exists an inductive set.
There are many inductive sets, because one can add an arbitrary set x to any
inductive set, provided one also adds its iterated successors x
+
, x
++
, etc. We would
like to single out the natural numbers as the smallest inductive set.
Denition 2.2. An inductive set is minimal if it is a subset of all inductive sets.
Theorem 2.3. There exists a minimal inductive set.
One natural approach to proving this theorem would be to take the intersection
of all the inductive sets and show that this intersection is inductive. The tricky
issue is that there are too many inductive sets for there to be a set of all inductive
sets, so one must tread carefully to avoid paradox. Fortunately, it is not hard to
repair the proof, by xing a single inductive set and working within it, so that we
avoid the whole issue of whether there is a set of all inductive sets.
Proof. Let x be any inductive set, and dene
= y x : y is in every inductive set.
Then is certainly a subset of every inductive set, by construction. To see that
it is a minimal inductive set, we must show that is inductive. Certainly ,
NOTES ON SET THEORY (18.510, FALL 2011) 7
because the empty set is in every inductive set. Furthermore, if y , then y
+
x
and y
+
is in every inductive set (by the denition of inductive), and thus y
+
.
It follows that is a minimal inductive set, as desired.
Extensionality implies that there can be only one minimal inductive set, because
given any two minimal inductive sets, each is a subset of the other.
Denition 2.4. The set of natural numbers is the minimal inductive set.
In set theory its traditional to use the symbol for the set of natural numbers.
Well follow this tradition, partly because it ts nicely with the theory of ordinals,
and partly to emphasize that is a formal construction within set theory, and we
must not make illicit use of our intuitions about natural numbers when analyzing .
Denition 2.5. A set is nite if it is in one-to-one correspondence with an element
of , and it has size (or cardinality) n if it is in one-to-one correspondence with
n .
It is not immediately obvious from our denitions and axioms that a nite set
cannot have two dierent sizes. See Proposition 4.11 for a proof.
Proposition 2.6. The union of two nite sets is always nite.
The proof is an illustration of how the minimal inductive set characterization
of enables us to carry out proofs by mathematical induction.
Proof. Let
S = n : the union of a set of size n and a nite set is always nite.
We will show that S is inductive, and thus S = . Clearly, S, because taking a
union with the empty set is the identity function. Thus, we just need to show that
S is closed under taking the successor n
+
of n.
Suppose n S, x has size n
+
, and y is nite; we wish to show that x y is nite.
Saying x has size n
+
means there is a bijection f : x n
+
= n n. Suppose
z x is sent to n n
+
under f; i.e., f(z) = n. Then f restricts to a bijection from
x z to n. Because n S, we see that (x z) y is nite. In other words, there
is a bijection g : (x z) y k for some k . The set we want to understand
is x y, and we have x y =
_
(x z) y
_
z. If z (x z) y (i.e., z y),
then x y = (x z) y and we are done. Otherwise, we extend g to a function
h: x y k k by h(w) = g(w) for w (x z) y and h(z) = k. Then h is a
bijection to k k , so x y is nite, as desired.
3. Partially, totally, and well-ordered sets
The set of natural numbers has given us a rigorous understanding of what
cardinality means for nite sets, but it is much less clear what it means in the
innite case. Perhaps surprisingly, the right approach is to study orderings, and not
just counting. In the nite case, cardinal numbers (one, two, three, etc., which are
used for counting) and ordinal numbers (rst, second, third, etc., which are used for
ordering) are structurally identical, but in the innite case we will see that there are
major dierences. The theory of ordinals is more comprehensive than the theory of
cardinals, and we will derive cardinals as a special case, but this will actually be
easier than trying to dene the cardinals directly. Even though counting seems like
a more basic notion than ordering, ordering is easier to pin down mathematically.
8 HENRY COHN
Denition 3.1. A partially ordered set (or poset) is a set S with a relation R SS
satisfying the following three properties for all x, y, z S, where we write x y to
mean (x, y) R:
(1) Reexivity: x x.
(2) Antisymmetry: if x y and y x, then x = y.
(3) Transitivity: if x y and y z, then x z.
Strictly speaking, it is an abuse of notation to refer to a set as a poset, because we
must specify the ordering as well, and not just the set of elements. To be formally
correct, we should dene a poset to be an ordered pair (S, R) with R S S, where
R represents the ordering relation on S. However, this can be a little cumbersome,
so we will typically not be so careful, except in cases where we are considering
several dierent orderings on the same set.
The name partial ordering refers to the fact that incomparable elements are
allowed, i.e., pairs x, y for which neither x y nor y x.
Denition 3.2. Two elements in a poset are comparable if one is greater than
or equal to the other, and the poset is totally ordered if every pair of elements is
comparable. A totally ordered subset of a poset is sometimes called a chain.
One example of a total ordering is the usual ordering on R, or on any subset
of it. Indeed, every subset of a poset becomes a poset by restricting the ordering.
However, most posets are not totally ordered. For example, the power set of any
set is a poset under the ordering , but most pairs of elements are not comparable
(neither contains the other).
It is traditional to use weak orderings , allowing the possibility of equality, but
for the theory of ordinals it will be convenient to use strict orderings <.
Denition 3.3. A strict partial ordering replaces the three properties above with
the following two:
(1) Antisymmetry: if x < y, then y ,< x.
(2) Transitivity: if x < y and y < z, then x < z.
Note that y ,< x does not mean y x, because x and y may be incomparable.
One can pass back and forth between weak and strict partial orderings by changing
whether equality is allowed.
Denition 3.4. An upper bound for a subset S of a poset is an element x in the
poset such that x y for all y S. It is the least upper bound for S if x x

for
every upper bound x

.
There can be at most one least upper bound for a given set, because if x and x

are both least upper bounds, then x x

and x

x, which implies x = x

. Note
also that every element of a poset is an upper bound for the empty set, so x is the
least upper bound for if and only if x y for all y in the poset.
The least upper bound of a set may not be in the set. For example,

2 is the
least upper bound of the interval [0,

2) in R. Even if a set has an upper bound,


there may be no least upper bound. For example, [0,

2) Q has no least upper


bound in Q.
As pointed out above, for every set S, the power set T(S) is a poset under .
Every subset of T(S) has a least upper bound, namely the union of its elements.
NOTES ON SET THEORY (18.510, FALL 2011) 9
Theorem 3.5 (Knaster-Tarski xed point theorem). Let P be a poset in which
every subset has a least upper bound, and let f : P P be an ordering-preserving
map (i.e., if x y then f(x) f(y)). Then f has a xed point. In other words,
there exists x P such that f(x) = x.
Proof. Let
S = x P : x f(x).
Then f maps S to itself, because x f(x) implies f(x) f(f(x)). Let y be the
least upper bound of S. For every x S, we have x y and hence f(x) f(y);
thus, because x f(x) for x S, we see that f(y) is also an upper bound for S.
Because y is the least upper bound, y f(y), so y S. However, that means
f(y) S, because S is closed under f, and thus f(y) y. It follows that f(y) = y,
as desired.
We can now prove a famous and fundamental result about comparing the sizes of
sets. We will use the Knaster-Tarski xed point theorem, but it is possible to give
more straightforward proofs. See, for example, Naive Set Theory by Halmos.
Theorem 3.6 (Cantor-Schroder-Bernstein). If S and T are sets for which there
are injective maps from S to T and from T to S, then there is a bijection between
S and T.
Proof. Let f : S T and g : T S be injective. Consider the complete lattice
T(S), and dene F : T(S) T(S) by
F(A) = S g[T f[A]].
Recall that f[A] denotes the image of the set A under f, i.e., f(x) : x A.
This function is order-preserving: taking the image under f or g preserves
inequalities (if A B then f[A] f[B]) while taking complements reverses them
(if A B S then S B S A), so F reverses the order twice and thus preserves
it.
Thus, the Knaster-Tarski xed point theorem provides a subset A of S such that
F(A) = A. This means A and g[T f[A]] are complements of each other within S.
The function f restricts to a bijection from A to f[A], because f is injective,
and g restricts to a bijection from T f[A] to g[T f[A]]. We can invert g to get
a bijection g
1
from g[T f[A]] to T f[A], and now we have bijections from the
complementary sets A and g[T f[A]] to the complementary sets f[A] and T f[A].
Thus, if we dene
h(x) =
_
f(x) if x A, and
g
1
(x) if x g[T f[A]],
then h is a bijection from S to T.
Denition 3.7. If S is a subset of a poset, then x S is a minimal element of S
if there is no y S with y < x (and a maximal element is dened the same way but
with the opposite inequality).
Note that we do not require that a minimal element be less than or equal to
everything else in the set, just that nothing be strictly less than it. A poset can have
several dierent minimal elements. For example, consider the following drawing of
a four-element poset, in which the diagonal lines indicate comparability (higher is
greater) and elements at the same horizontal level are incomparable:
10 HENRY COHN

d
d
d
d
d
d
r
r
r
r
In a totally ordered set, a minimal element x of a subset S is an element of S
such that x y for all y S. It follows that S can have at most one minimal
element.
Denition 3.8. A well-ordered set is a totally ordered set in which every non-empty
subset has a minimal element.
The standard example of a well-ordering is the usual ordering on the natural
numbers. Most totally ordered sets are not well-ordered; for example, R is not
well-ordered under its usual ordering. Note that every subset of a well-ordered set
is itself well-ordered by the restriction of the ordering relation.
Well-ordered sets play a key role in set theory, because they generalize the
fundamental property of ordering a nite set: there is always a next element. In a
well-ordered set, given any subset S that is not the entire set, there is always a least
element outside S. This means if we are building up a subset, until we ll up the
entire set there is always a single element to consider next.
We will show that is well-ordered, and this is essentially equivalent to the
validity of mathematical induction. Sometimes proofs can be expressed particularly
elegantly using well-ordering directly; this technique is often called innite descent.
For example, stepping slightly outside out formal development of set theory, we can
use well-ordering to prove the irrationality of

2. This is certainly not the simplest
or best-motivated proof, but it is an entertaining illustration of innite descent.
Suppose

2 were rational, and let

2 = p/q, where p and q are positive integers
with q as small as possible. Then

2 =
2

2 1
=
2 p/q
p/q 1
,
so

2 =
2q p
p q
.
However, 1 <

2 < 2 implies that q < p < 2q, and thus 0 < p q < q, so we have
decreased the denominator of

2, which contradicts the minimality of q.
The name innite descent describes how the rationality of

2 would lead to
an innite decreasing sequence of positive integers, namely the denominators of the
expressions for

2.
Well-ordered sets are just on the threshold of what we can analyze: they can be
subtle, but their structure is far more predictable than in an arbitrary poset. Lets
think informally about what a well-ordered set looks like:
The simplest well-ordered set is . If a well-ordered set is non-empty, then it
must have a least element, which we will label 0. The set could just be 0, but
if there are any further elements there must be a least one, which we will label 1,
and a least one beyond that (if the set doesnt end at 0, 1), which will we label 2.
Thus, in this labeling the well-ordered set must begin 0 < 1 < 2 < . . . , and it either
stops after nitely many steps or continues forever.
NOTES ON SET THEORY (18.510, FALL 2011) 11
However, even continuing forever may not exhaust the set. If there are any
further elements (greater than each of 0, 1, 2, . . . ), then there must be a least one,
which we will label . Then we can continue with < + 1 < + 2 < . . . , unless
we run out of elements along the way. However, even that may not cover everything:
there may be an element 2 greater than all of them. We can continue with
2 < 2 +1 < 2 +2 < . . . , and then 3 greater than those, etc. If we dont
stop along the way, eventually we reach , also known as
2
, and higher powers
of . After still longer (innite iterations of innite iterations of innite iterations,
etc.), we may reach

. Beyond that is

, and eventually even

.
.
.
.
Even this is just the beginning, and well-ordered sets can continue far beyond there.
However, the crucial point is that we have no choice along the way: at each step of
this process, we simply pass to the next element in the ordering. In Section 5, well
see more precise versions of these intuitive phenomena.
4. Ordinals and their basic properties
Recall the von Neumann denition of the natural numbers, with 0 = , 1 = 0,
2 = 0, 1, 3 = 0, 1, 2, etc. In other words, the successor n + 1 of n is given by
nn. In this setting, we can easily dene the ordering by n < m i n m. Then
each natural number equals the set of all natural numbers that come before it, and
the natural numbers are strictly well-ordered by .
Ordinals generalize these two properties: they will be well-ordered by , and
each ordinal will equal the set of all the ordinals that come before it. However,
its not immediately obvious how to use these properties to dene the ordinals
without circularity. Instead, we will characterize them using some consequences of
the properties, and we will show that this denition then leads to the properties we
wanted.
To start o, if the elements of an ordinal are supposed to be ordinals, and if all
ordinals are supposed to be well-ordered by , then the elements of each ordinal
should be well-ordered by . That will be the rst dening property of an ordinal.
Furthermore, if , , and are ordinals satisfying and , then the
transitivity of the ordering should lead to . What that means is that each
element of an ordinal should be a subset of as well (since that is equivalent to
saying every element of is an element of ).
Our formal denition of an ordinal simply requires these two properties, but we
will see that they are enough to build up the whole theory.
Denition 4.1. An ordinal is a set such that
(1) is strictly well-ordered under , and
(2) is transitive; i.e., for all x , we have x .
Note that we do not directly assume, for example, that the elements of an ordinal
must be ordinals themselves. Its dicult to formulate such an assumption as part
of the denition without circularity.
Its traditional, although of course not mandatory, to use lowercase Greek letters
for ordinals. Well generally use other letters for sets that either arent ordinals or
arent yet known to be ordinals.
12 HENRY COHN
Denition 4.2. The successor of an ordinal is
+
= .
Lemma 4.3. The empty set is an ordinal, and for each ordinal , its successor

+
is an ordinal.
Proof. The empty set satises the denition of an ordinal vacuously, since it has no
elements.
For the second part, let be an ordinal. Then is well-ordered by , and the
only additional element of
+
is itself. Every element of is less than under
the ordering , so we are simply adding another element above all the elements of .
Now we can see that every non-empty subset of
+
has a minimal element: if the
subset has non-empty intersection with , then the least element of that intersection
is the least element of the whole subset. The only non-empty subset of
+
that
doesnt intersect is , and it also has a minimal element (namely, ). Thus,
+
is well-ordered by .
Transitivity is also not hard to check. Let x be any element of . If x ,
then x by the transitivity of , so x
+
. Aside from elements of , the only
other element of is x = itself, in which case x is still true. It follows
that
+
is both well-ordered by and transitive, so it is an ordinal.
Recall that denotes the set of natural numbers.
Corollary 4.4. Every element of is an ordinal.
Proof. Lemma 4.3 is all we need for a proof by induction, and the characterization
of as the minimal inductive set enables us to carry out the proof. Let
S = : is an ordinal.
Then S , and Lemma 4.3 implies that S is inductive. It follows that S, so
S = and thus every element of is an ordinal.
So far, we have recovered the motivating examples, namely the natural numbers.
It will turn out that is itself an ordinal. We will also see that is the rst limit
ordinal, other than the empty set.
Denition 4.5. A limit ordinal is an ordinal that is not the successor of any
ordinal.
Denition 4.6. For ordinals and , we dene < to mean . Of course,
means < or = , and > means < , etc.
As an immediate consequence of the denitions so far, we have <
+
for all
ordinals .
Lemma 4.7. For each ordinal , either = 0 or > 0.
Proof. Suppose ,= 0. In other words, ,= , so because is well-ordered under ,
there exists an -minimal element . Minimality means no element of is an
element of . However, by transitivity, . Thus, the fact that no element of
is in means = . It follows that = , so 0 < , as desired.
Lemma 4.8. No ordinal satises .
NOTES ON SET THEORY (18.510, FALL 2011) 13
Proof. By assumption, is strictly well-ordered under . The antisymmetry of
this well-ordering implies that no element of satises . (Recall that
antisymmetry for a strict well-ordering means x < y implies y ,< x. Taking y = x
implies that x < x is impossible, and for ordinals we are using < to mean .) If
, then taking = contradicts antisymmetry.
In fact, in ZFC no set can be a member of itself, but we cannot prove that given
our axioms so far.
Axiom (Foundation). Every non-empty set contains an element disjoint from it.
In other words, every set S contains an element x such that no y S satises
y x. This means x is an -minimal element of S. Thus, the Axiom of Foundation
implies that every set that is totally ordered under is actually well-ordered under
.
Corollary 4.9. No set is an element of itself.
Proof. Suppose S S. If we let T = S, then T S = S, so the only element of
T is not disjoint from T, which contradicts the Axiom of Foundation.
Corollary 4.10. There do not exist sets S
0
, S
1
, S
2
, . . . such that S
0
S
1
S
2
. . . .
More formally, there do not a exist a set R and a function f : R such that
f(i
+
) f(i) for all i .
Note that the formal version is needed because . . . is imprecise. In general, an
innite sequence simply means a function dened on . This sort of translation is
routine once you are used to it.
Proof. The set T = S
0
, S
1
, . . . contradicts the Axiom of Foundation. (In formal
terms, let T be the image f[].)
The Axiom of Foundation is the least useful axiom of set theory for everyday
mathematics: its primary purpose to rule out pathological examples, but these
examples can actually be useful (see the book Vicious Circles by Barwise and Moss),
and in any case they wont hurt us if we ignore them. One can dene well-founded
sets, which are the sets such that they, their elements, their elements elements,
etc. all satisfy the Axiom of Foundation. Then if one ever needs the Axiom of
Foundation, instead of assuming it one can insert the words well-founded into the
theorem statement.
We will not use the Axiom of Foundation again.
Proposition 4.11. There is no bijection between two distinct elements of .
Proof. Let
S = n : there exists no bijection from n to m with m and m ,= n.
We will show that S is inductive, from which it follows that S = .
Clearly, S, since there is no bijection from it to any non-empty set. Now
suppose n S. We must show that n
+
is not in bijection with any other element
of . If it is, then that element must be non-empty and is therefore the successor
of another natural number. (This is easy: the set of all natural numbers that are
either 0 or a successor is inductive, so it contains all the natural numbers.)
Thus, suppose f is a bijection from n
+
to m
+
, with n
+
,= m
+
and hence n ,= m.
We will construct a bijection from n to m, which contradicts the fact that n S.
14 HENRY COHN
The function f maps n n to m m. If f(n) = m, then f restricts to a
bijection from n to m. Thus, suppose f(n) ,= m, and let be the unique element of
n
+
(and, in particular, element of n) such that f() = m. Then we dene a new
function g : n m by g(x) = f(x) for x ,= and g() = f(n). We will show that g
is a bijection.
To check that g is surjective, note that because g(x) = f(x) for x n with x , = ,
the image of g includes every value in the image of f except for f() (which we dont
want anyway, since it equals m) and f(n), which is covered by g() = f(n). Thus,
because f is surjective onto m
+
, g is surjective onto m. To check that it is injective,
suppose g(x) = g(y) with x, y n. If neither x nor y equals , then f(x) = f(y) and
thus x = y since f is injective. If x = ,= y, then f(n) = g(x) = g(y) = f(y), from
which y = n follows, and this contradicts y n (by Lemma 4.8). Thus, g(x) = g(y)
implies x = y as desired.
Lemma 4.12. For all ordinals and , their intersection is an ordinal.
Proof. Certainly is well-ordered under , as a subset of (or ). To prove
transitivity, we begin by noting that if x , then x and x . Because
and are transitive, it follows that x and x , so x . Thus, is
transitive, so it is an ordinal.
Lemma 4.13. If and are ordinals satisfying , then either = or .
Thus, if and only if .
Proof. Suppose ,= . Then let x be the least element of that is not in (which
exists because is well-ordered). We will show that x = , and hence .
First, we prove x . By the transitivity of , we have x . All smaller
elements of than x are in , by the denition of x. This means for all y x, we
have y , or in other words x .
For the other direction, we must prove x. Suppose y . Because is
totally ordered by , we must have x = y, x y, or y x, and we want to show
that the last possibility is the true one. If x = y, then x , which contradicts the
denition of x. If x y, then we apply the transitivity of the ordering of by to
pass from x y to x , which again contradicts the denition of x. Thus,
y x. This shows that x, so it follows that = x and hence , as desired.
Thus, we have shown that implies . For the converse, if , then
= or < , and in the latter case and hence by transitivity.
Corollary 4.14. If and are ordinals satisfying < , then
+
. If is a
limit ordinal, then < implies
+
< .
Proof. If < , then and thus . It follows that
+
= , so
by Lemma 4.13,
+
. If is a limit ordinal, then
+
, = and hence
+
< .
Proposition 4.15. Every set of ordinals is (strictly) totally ordered by <.
The reason we dont say the set of all ordinals is that, as we will see below,
there are too many ordinals to form a set.
Proof. We must verify that < is antisymmetric and transitive, and that all pairs of
elements are comparable.
For antisymmetry, suppose and are ordinals satisfying < and < , i.e.,
and . By the transitivity property of ordinals, implies ,
NOTES ON SET THEORY (18.510, FALL 2011) 15
and combining this with yields , which contradicts Lemma 4.8. Thus,
< is antisymmetric for ordinals.
Transitivity is similar: suppose , , and are ordinals satisfying < < .
Then and , and the latter implies that , so we conclude that ,
as desired.
Finally, comparability follows from Lemma 4.13. If and are ordinals, then so
is by Lemma 4.12. If it equals either of or , then one is a subset of the
other and Lemma 4.13 implies that and are comparable. However, if it equals
neither, then Lemma 4.13 implies that it is an element of both. Then, however,
and together imply that , which contradicts
Lemma 4.8.
Proposition 4.16. Every non-empty set of ordinals has a minimal element.
Proof. Let S be a non-empty set of ordinals. A minimal element of S is just an
element S such that there exists no S satisfying < (equivalently, ).
In other words, it is an element S satisfying S = .
Let be any element of S. If S = , then is minimal. Otherwise, suppose
S ,= . Because is well-ordered under (by the denition of an ordinal),
its non-empty subset S has a least element . Then S and S = : if
S ,= , then there is some < in S, but then < < implies < (by
Proposition 4.15). Thus, S and < , which contradicts the minimality of
in S. It follows that S = , so is the minimal element of S.
Corollary 4.17. Every set of ordinals is strictly well-ordered under <.
This follows immediately from Propositions 4.15 and 4.16.
Corollary 4.18. The set of natural numbers is an ordinal, and in particular a
limit ordinal.
Proof. Corollary 4.17 implies that is well-ordered under . To see that it is
transitive, we simply prove that S = x : x is inductive. This is
straightforward: clearly S, and if x S, then x and x by assumption,
so it follows that xx as well. Thus, S = , so is transitive and is therefore
an ordinal.
To see that it is a limit ordinal, suppose =
+
. Then , because
+
.
However, is an inductive set, so it is closed under taking the successor. Thus,
implies
+
, so we conclude that , which contradicts Lemma 4.8.
Proposition 4.19. Every element of an ordinal is an ordinal.
Proof. Let be an ordinal and x . Then x by transitivity of , and every
subset of a well-ordered set is well-ordered. Thus, x is strictly well-ordered under .
For transitivity, suppose y x. We would like to conclude that y x; in other
words, we would like to show that for all z y, we have z x. Because is
transitive, x implies x , and then y x implies y . Furthermore, the
transitivity of then implies that y , so z y implies z . Thus, x, y, and z
are all elements of . However, is totally ordered under , so y x and z y
imply z x, as desired.
Theorem 4.20 (Burali-Forti paradox). There is no set of all ordinals.
Despite the hyphen, Burali-Forti was one person.
16 HENRY COHN
Proof. Call the hypothetical set S of all ordinals. As a set of ordinals, S is well-
ordered under (Corollary 4.17). Furthermore, S is transitive: if S, then S
since all elements of are ordinals (Proposition 4.19). Thus, S is itself an ordinal.
However, that means S S, which contradicts Lemma 4.8.
From a modern perspective, this is not really a paradox, but simply a proof that
the set of all ordinals cannot exist.
Lemma 4.21. The union of any set of ordinals is an ordinal.
Proof. Let S be any set of ordinals, and let U =

S
. Every element of an
ordinal is an ordinal, so U is a set of ordinals and is therefore well-ordered under
. Thus, to show that U satises the denition of an ordinal, we just need to show
that it is transitive. Suppose x U. This means there exists some S such that
x . Then the transitivity of implies that x , and it follows that x U, as
desired.
Corollary 4.22. Let S be any set of ordinals. Then
_
S

is the least upper bound for S (i.e., the smallest ordinal that is greater than or equal
to each element of S).
Proof. Combine Lemmas 4.21 and 4.13.
Lemma 4.23. An ordinal is a limit ordinal if and only if it equals the union of all
smaller ordinals.
Proof. Let be an ordinal. If =
+
, then is an upper bound for all the ordinals
less than (if
+
= , then or = ). Thus,
_
<
= < ,
by Corollary 4.22. By contrast, suppose is a limit ordinal. Then for every ordinal
< , we have
+
< as well, by Corollary 4.14. Corollary 4.22 implies that
_
<
,
since is an upper bound for all < . However, if
_
<
< ,
then
_
_
_
<

_
_
+
< ,
which implies that
_
_
_
<

_
_
+

_
<
,
NOTES ON SET THEORY (18.510, FALL 2011) 17
because the right side is an upper bound for all ordinals that are less than . Then
by Lemma 4.13,
_
_
_
<

_
_
+

_
<
.
This means
_
<

_
<
,
which contradicts Lemma 4.8. Thus,
_
<
= ,
as desired.
5. Ordinals and well-ordered sets
Denition 5.1. An isomorphism between two posets S and T (with orderings <
S
and <
T
, respectively) is a bijection f : S T such that for all x, y S,
x <
S
y f(x) <
T
f(y).
We say that S and T are isomorphic (written S

= T) if there is an isomorphism
between them. An automorphism of a poset is an isomorphism to itself.
Usually we will abuse notation slightly and denote both orderings by the same
symbol. This shouldnt cause much confusion, since its generally clear which
ordering is meant just from context, but of course its important to keep in mind
that dierent posets will have dierent orderings.
Every poset has at least one automorphism, namely the identity function (called
the trivial automorphism). Its easy to check that isomorphism is an equivalence
relation: it is reexive since every poset is isomorphic to itself via the identity
function, symmetric since the inverse of an isomorphism is an isomorphism, and
transitive since composing two isomorphisms yields an isomorphism.
For an example of an isomorphism, the map x tan(x/2) is an isomorphism
from the interval (1, 1) to R, where both sets are given their usual orderings. The
posets R has many automorphisms, since as x x + 1. For an example in a poset
that is not totally ordered, consider the following four-element poset, where the
diagonal lines indicate covering relations (the higher-up element is greater than the
lower) and the two middle elements are incomparable:

d
d
d
d
d
d

r
r
r r
This poset has two automorphisms, with the nontrivial one switching the two
incomparable elements.
Proposition 5.2. Well-ordered sets have no nontrivial automorphisms.
18 HENRY COHN
Proof. Let S be an well-ordered set, and let f be any automorphism of S. Dene
T = x S : f(x) ,= x.
If T = , then f is the identity function, as desired. Otherwise, let x be the least
element of T. Because f preserves the ordering, f(x) must be the minimal element
of the image f[T] of T under f. However, f[T] = T, because f xes the complement
of T pointwise. It follows that f(x) is the minimal element of T, and thus f(x) = x,
which contradicts x T. Therefore T = and f must be the identity function.
Corollary 5.3. If two well-ordered sets are isomorphic, then there is a unique
isomorphism in each direction between them.
Proof. Suppose f : S T and g : S T are two isomorphisms between well-ordered
sets. Then the composition f
1
g is an automorphism of S, so it must be the
identity function, and hence f = g.
Denition 5.4. If S is a well-ordered set and x S, the initial segment S
x
is
y S : y < x.
For example, if and are ordinals with < , then is an initial segment
of . Specically, , from which follows, and therefore is the initial
segment

.
Proposition 5.5. If S is well-ordered and x S, then S ,

= S
x
.
Proof. The proof is very much like that of Proposition 5.2. Suppose f : S S
x
is
an isomorphism, and let
T = y S : f(y) ,= y.
If T = , then f(x) = x and so S
x
must contain x, which contradicts the denition
of S
x
. Otherwise, let y be the least element of T. Because y is the least element of
S S
y
, it follows that f(y) is the least element of f[S] f[S
y
]. However, f[S] = S
x
by assumption, and every element of S
y
is xed by f by the denition of y. Thus,
f(y) must be the least element of S
x
S
y
, which is y (if indeed there is any element
of S
x
S
y
). Thus, f(y) = y, which contradicts y T.
Corollary 5.6. Two distinct ordinals cannot be isomorphic as well-ordered sets.
Proof. If and are ordinals satisfying ,= , and without loss of generality < ,
then =

, so it follows from Proposition 5.5 that ,

= .
Corollary 5.7. No two dierent initial segments of a well-ordered set are isomor-
phic.
Proof. Suppose S is a well-ordered set and S
x1

= S
x2
with x
1
,= x
2
. Without loss
of generality, x
1
< x
2
, and then S
x1
= (S
x2
)
x1
. However, then (S
x2
)
x1

= S
x2
, which
contradicts Proposition 5.5.
Denition 5.8. A subset S of a poset is downwards closed if whenever x S and
y x, it follows that y S.
Lemma 5.9. The only downwards-closed subsets of a well-ordered set are the set
itself and its initial segments.
NOTES ON SET THEORY (18.510, FALL 2011) 19
Proof. Let T be a downwards closed subset of a well-ordered set S. If T ,= S, then
let x be the least element of S not in T. Then by the denition of x, we have
S
x
T, and T S
x
holds because if T contained any greater element it would also
contain x (being downwards closed). Thus, T = S
x
, as desired.
Theorem 5.10. If S and T are well-ordered sets, then S

= T, or S
x

= T for some
x S, or S

= T
y
for some y T.
The intuition is that as we try to build an isomorphism from S to T, there is no
exibility or choice along the way. The minimal element of S must map to T, then
the same must be true for the next least elements, etc. If we run out of elements
in one set before the other, we end up with an isomorphism from it to an initial
segment of the other. Otherwise, we get an isomorphism between S and T. One can
make this intuition precise and use it to prove the theorem, but we will use a slicker
approach. It essentially encodes the same idea, namely that a partial isomorphism
can always be extended until one set runs out of elements, but it uses a particularly
nice characterization of how to match up elements of S and T. Specically, the
isomorphism should map x S to y T if and only if S
x

= T
y
.
Proof. Let
f = (x, y) S T : S
x

= T
y
.
Then f is a function from some subset of S to T, because if (x, y
1
) f and
(x, y
2
) f, then T
y2

= T
y1
, which implies y
1
= y
2
by Corollary 5.7. Therefore we
can use functional notation (writing f(x) = y to mean (x, y) f). Furthermore, f
is injective, because f(x
1
) = f(x
2
) implies S
x1

= S
x2
and thus x
1
= x
2
. This means
f is a bijection from its domain to its image.
Furthermore, f is a poset isomorphism from its domain to its image: if x
1
< x
2
,
then T
f(x1)
is isomorphic to an initial segment of T
f(x2)
, because they are isomorphic
to S
x1
and S
x2
, respectively. This implies that f(x
1
) < f(x
2
), because if f(x
1
)
f(x
2
), then T
f(x1)
would be isomorphic to an initial segment of itself.
Both the domain and the image of f are downwards closed: if S
x

= T
y
, then
every initial segment of S
x
is isomorphic to an initial segment of T
y
, as vice versa.
Thus, the domain of f is either S or an initial segment of S, by Lemma 5.9, and
the image is either T or an initial segment of T.
All that remains to be shown is that the domain is S or the image is T. If the
domain of f is not S and the image is not T, let x be the least element of S not
in the domain, and let y be the least element of T not in the image. Then f is an
isomorphism from S
x
to T
y
, so (x, y) f after all.
Theorem 5.11. Every well-ordered set is isomorphic to an ordinal.
In other words, ordinals classify all possible well-ordered sets, by bringing them
into a canonical form: each well-ordered set is isomorphic to an ordinal, and two
well-ordered sets are isomorphic if and only if the corresponding ordinals are equal.
Furthermore, this also canonically labels the elements of each well-ordered set, since
the isomorphism from it to an ordinal is unique.
To prove Theorem 5.11, we will need the last axiom of ZF set theory (and the
last axiom of ZFC except for the Axiom of Choice):
Axiom (Replacement). Suppose (x, y) is a mathematical statement and S is a set
such that for all x S, there is a unique set y such that (x, y) holds. Then there
is a set T such that for all y, y T i there exists x S such that (x, y).
20 HENRY COHN
As with the previous axioms, we will make precise later in the course what is
meant by a mathematical statement. The intuition here is that the only thing
keeping T from being a set is that it might be too big. If we form it by replacing
each element x in a set S with a unique y, then it will have the same size as S and
should not be too big to be a set.
We can think of (x, y) as dening a functional relationship between x and y:
for each x S it denes a unique y. However, this is a little bit tricky given the
formalization of functions we have been using. Specically, without assuming the
Axiom of Replacement it is not clear that actually denes a function: we want a
dene the corresponding function as the set of ordered pairs in S T consisting
of exactly the pairs (x, y) satisfying (x, y), and we cant even get o the ground
without having the set T already!
Proof of Theorem 5.11. Let S be a well-ordered set, and dene
T = x S : there exists an ordinal such that

= S
x
.
For each x S, the ordinal with

= S
x
is unique if it exists, by Corollary 5.6.
Thus, by the Axiom of Replacement,
: is an ordinal and there exists x S with S
x

=
is a set. Call this set .
First, we observe that is an ordinal: it is well-ordered under because it is a set
of ordinals, and it is transitive because if and , we have an isomorphism
f : S
x
for some x S, and then
=


= (S
x
)
f()
= S
f()
,
so , as desired.
Now we can apply Theorem 5.10. If

= S
x
for some x S, then , which
contradicts Lemma 4.8. If


= S for some , then

=


= S, but
means

= S
x
for some x S, and thus S

= S
x
, which contradicts Proposition 5.5.
Thus, the only remaining possibility is

= S, so S is isomorphic to an ordinal, as
desired.
Denition 5.12. The order type type(S) of a well-ordered set S is the unique
ordinal isomorphic to it.
For several applications, it will be important to be able to dene functions on
well-ordered sets recursively in terms of their previous values. This technique is
called transnite recursion:
Theorem 5.13. Let S be a well-ordered set, and let
g : (x, h) : x S and h is a function from S
x
to T T
be a function. Then there is a unique function f : S T such that for all x S,
f(x) = g(x, f[
Sx
).
If x is the least element of S, then S
x
= and f(x) = g(x, ). Each further value
f(x) is then uniquely determined by what came before, i.e., by the restriction of f
to S
x
. Note that we cannot use a recurrence that relies just on the previous value,
i.e., the value at the immediate predecessor of x, because there may not be such a
value. For example, has no predecessor as an element of the ordinal
+
.
NOTES ON SET THEORY (18.510, FALL 2011) 21
Proof. Call a function f admissible if its domain is a downwards-closed subset of S
and it satises the recurrence
f(x) = g(x, f[
Sx
).
There can be at most one admissible function on any downwards-closed subset of
S: if there were two, say f and f

, and if we let x be the rst point at which they


dier, then f[
Sx
= f

[
Sx
and hence f(x) = f

(x).
Now let f be the union of all the admissible functions. Then f is a function
dened on a downwards-closed subset of S, because no two admissible functions can
ever disagree at any point, and it too satises
f(x) = g(x, f[
Sx
)
and is thus admissible. The domain of f must be S; otherwise, if we let x be the
least point outside the domain, then we could extend f by dening f(x) = g(x, f[
Sx
),
which would contradict the containment of all admissible functions within f.
Proposition 5.14. Let T be a well-ordered set, and give S T the induced well-
ordering. Then type(S) type(T).
Proof. Let t be the least element of T (if T = then the proposition is trivial). By
transnite recursion, there is a unique function f : S T satisfying
f(x) =
_
least element of T f[S
x
] if f[S
x
] ,= T, and
t otherwise.
Then f(x) x for all x S: if not, then consider the rst x for which f(x) > x to
derive a contradiction. Thus, f[S
x
] ,= T since x T f[S
x
], so f satises
f(x) = least element of T f[S
x
].
(The only purpose of the second clause was that Theorem 5.13 requires g(x, f[
Sx
) to
be dened under all circumstances. However, one can dene it arbitrarily in cases
that will never arise.)
If x y, then T S
y
T S
x
and hence f(x) f(y). Furthermore, f[S] is
downwards-closed, because every element less than f(x) must be in f[S
x
]. Therefore
f is an isomorphism from S to T or to an initial segment of T, so the order types
satisfy type(S) type(T), as desired.
Proposition 5.15. For every set S, there exists an ordinal such that there is no
injection from to S.
Proof. We would like to dene
= : is an ordinal and there is a injective map from to S.
However, this is a little tricky, because it is not clear that any set contains all these
ordinals, so we cannot just use separation. First, well assume there is such a set
and go on to complete the proof, and then well justify it using replacement.
Given that exists, it is an ordinal: it is well-ordered under because it is a
set of ordinals, and it is transitive because if and , then by the
transitivity of , and the injection from to S restricts to one from to S, so
, as desired. However, cannot have any injective map to S, since if it did,
then .
22 HENRY COHN
Thus, all we need to do is to show that there is a set as dened above. We
start with the set of pairs consisting of a subset of S and a well-ordering of that
subset. Let
U = (T, R) T(S) T(S S) : R T T and R denes a well-ordering of T.
For every (T, R) U, there is a unique ordinal that is isomorphic to T with the
ordering R, and these ordinals are exactly the ordinals that have injective functions
into S: the isomorphism from to T is such a function, and conversely an injective
map from to S denes a well-ordering on the image of the map and thus leads to
an element of U. Now applying the Axiom of Replacement to replace (T, R) with
the ordinal shows that
: is an ordinal and there is a injective map from to S
is indeed a set.
Proposition 5.15 shows that ordinals can in a certain sense be arbitrarily large.
Thats just one step removed from saying that every set can be well-ordered, but
for that well need the nal axiom of ZFC.
6. The Axiom of Choice
Denition 6.1. A choice function for a set S is a function f : S

xS
x such
that f(x) x for every x S.
In other words, if we view S as a collection of sets, a choice function chooses an
element f(x) in each set x S. Obviously, there cannot be a choice function for S
if S, but the Axiom of Choice says that is the only restriction.
Axiom (Choice). Every set of non-empty sets has a choice function.
This axiom is radically dierent in character from all the previous axioms, because
it in no way species what the choice function should be. By contrast, the previous
axioms asserted the existence of sets that could be specied uniquely. The diculty
with the Axiom of Choice is that it is not clear there exists a function that describes
innitely many choices at once. Even aside from the philosophical question about
whether it is possible to make innitely many choices at once, set theory might be
missing some functions we would naively expect to be there (the same way we might
imagine there would be a universal set).
However, we neednt worry about running into trouble with the Axiom of Choice.
In 1938, Godel proved that it is consistent with the other axioms of set theory: if
one can derive a contradiction using choice, then the remaining axioms are already
contradictory even without choice. Thus, the axiom is harmless, but mathematicians
still wondered whether it was redundant. This was settled in 1963, when Cohen
proved that in fact the Axiom of Choice is independent of the other axioms (i.e., its
negation is also consistent with them).
For the rest of this section, we will not assume the Axiom of Choice, except
as indicated in the theorem statements. Instead, we will focus on describing its
consequences and equivalent forms. Specically, we will prove the equivalence of
the following assertions:
(1) The Axiom of Choice.
(2) The well-ordering theorem: every set can be well-ordered.
NOTES ON SET THEORY (18.510, FALL 2011) 23
(3) Trichotomy: for all sets S and T, there exists an injective function from S
to T or one from T to S.
(4) Zorns lemma (stated below).
It may seem odd that trichotomy involves only two possibilities, and perhaps it
should be called dichotomy. The motivation for the name is that by the Cantor-
Schroder-Bernstein theorem, it implies that [S[ = [T[, [S[ < [T[, or [T[ < [S[
(although we will not dene this notation until Section 7).
Proposition 6.2. If the well-ordering theorem holds, then so does trichotomy.
Proof. If we well-order S and T, then by Theorem 5.10, we have S

= T, or S
x

= T
for some x S, or S

= T
y
for some y T. In the rst case we get a bijection
between S and T, in the second an injection from T to S, and in the third an
injection from S to T. Thus, in each case there is an injection in one direction or
the other.
This proof may seem outrageous, since it imposes nontrivial structure to deduce
a simple, intuitive conclusion. However, that structure is in fact equivalent to the
conclusion:
Proposition 6.3. If trichotomy holds, then every set can be well-ordered.
Proof. Given a set S, by Proposition 5.15 there is an ordinal from which there is
no injective map to S. By trichotomy, there must be an injective map from S to ,
and it is a bijection from S to a subset of . Every subset of a well-ordered set is
well-ordered, and the bijection transfers this structure to S.
Thus, the well-ordering theorem and trichotomy are equivalent.
Proposition 6.4. If the well-ordering theorem holds, then so does the Axiom of
Choice.
Proof. Given a set S of non-empty sets, well-order
_
xS
x.
Now we can dene a choice function f : S

xS
x by letting f(x) be the least
element of x.
Intuitively, a well-ordering lets you make innitely many choices in a systematic
way, by always taking the minimal option.
To complete the proof that the Axiom of Choice, the well-ordering theorem, and
trichotomy are equivalent, all we need to do is to prove that the Axiom of Choice
implies the well-ordering theorem. This is such an important theorem that we will
prove it twice, once using ordinals and once by a more elementary approach.
Theorem 6.5. The Axiom of Choice implies that every set can be well-ordered.
The intuition behind both proofs is simple. Given a set S, we will pick a choice
function f for the non-empty subsets of S. Then x
0
= f(S) will be the least element
of S, x
1
= f(S x
0
) will be the second-least element, x
2
= f(S x
0
, x
1
) will
be the third-least, etc. We will continue until we ll the entire set. However, that
is a little tricky: x
i
: i may not ll S, in which case we must continue with
x

= f(S x
i
: i ), etc. It is not obvious that this process can actually be
completed, so there is genuinely something to prove.
24 HENRY COHN
First proof. Let S be a set, and let f be a choice function for T(S) . For each
ordinal , by transnite recursion there is a unique function g : S satisfying
the recurrence
g() =
_
f(S g() : < ) if g() : < , = S, and
f(S) otherwise.
Note that we can write g() : < as g[].
Suppose g[] = S for some , and let denote the rst point at which that
occurs. Then g[

is injective, because for


1
<
2
< , we have
g(
2
) = f(S g[
2
]) S g[
2
]
and g(
1
) g[
2
], so g(
1
) ,= g(
2
). Thus, in this case g[

is a bijection between
the ordinal and S, so S can be well-ordered.
If no such exists, then the same argument shows that g is an injective function
from to S. However, Proposition 5.15 shows that there are ordinals with no
injective maps to S. For such an , there must exist for which g[

is a bijection
between the ordinal and S, so we see that S can indeed be well-ordered.
One can also prove the well-ordering theorem without even mentioning ordinals:
Second proof. Let S be a set, and let f be a choice function for T(S) . Dene
an f-ordered subset of S to be a subset A with a well-ordering <
A
on A, such that
for all x A,
x = f(S A
x
).
(Recall that A
x
is the initial segment y A : y <
A
x.)
First, we will show that every isomorphism g : A B between f-ordered subsets
is the identity function. If not, let x be the least element of A for which g(x) ,= x.
Then
x = f(S y A : y <
A
x)
= f(S g(y) : y A, y <
A
x)
= f(S g(y) : y A, g(y) <
B
g(x))
= f(S z B : z <
B
g(x))
= g(x),
which contradicts g(x) ,= x. It now follows from Theorem 5.10 that for every pair of
f-ordered subsets, either they are equal or one is an initial segment of the other. It
follows that the orderings on two f-ordered subsets agree on their overlap.
We would like to dene a maximal f-ordered subset T, i.e., one that contains
all the others. Maximality implies that T = S, since otherwise we could dene an
f-ordering on T f(S T) by making f(S T) greater than every element of
T. Thus, the existence of a maximal f-ordered subset implies that S can itself be
well-ordered.
To construct T, we will take the union of all the f-ordered subsets of S. Dene the
ordering <
T
by x <
T
y if and only if x <
A
y for some f-ordered subset containing
both x and y. Equivalently, x <
A
y for every f-ordered subset containing both x
and y, because the orderings of the f-ordered subsets agree.
First, we must verify that T is totally ordered by <
T
. If x and y are elements
of T, then by the denition of T they are contained in f-ordered subsets A and B.
Either those subsets are the same, or one is an initial segment of the other, and
NOTES ON SET THEORY (18.510, FALL 2011) 25
either way both x and y are contained in a single f-ordered subset (whichever of A
and B is larger). Thus, they are comparable, and antisymmetry holds. Similarly,
any three elements of T are all contained in a single f-ordered subset, so <
T
is
transitive.
Next, we check that <
T
strictly well-orders T. If U is a non-empty subset of T,
let u be an element of U. Then u is in some f-ordered subset A, and all smaller
elements of T must be in A as well (since every other f-ordered subset is either an
initial segment of A or has A as one of its initial segments). Thus, t T : t
T
u
is contained in A, and it has a least element in A. However, the orderings <
T
and
<
A
agree on the elements of A, so there is a least element in T as well.
Finally, we verify that T is f-ordered. If x T, then x A for some f-ordered
A, and then
x = f(S y A : y <
A
x)
= f(S y T : y <
T
x),
as desired.
Thus, there exists a maximal f-ordered subset of S, which must be S itself, and
therefore S can be well-ordered.
So far, we have shown that the Axiom of Choice, the well-ordering theorem, and
trichotomy are equivalent. One nal form will be Zorns lemma: if every chain in
a poset has an upper bound, then the poset has a maximal element. Recall from
Section 3 that a chain is a totally ordered subset, an upper bound for a chain is an
element of the poset that is greater than or equal to everything in the chain, and
a maximal element of the poset is an element such that nothing is strictly greater
than it.
Unlike trichotomy or the well-ordering theorem, Zorns lemma is somewhat
technical, and it is not obvious why one should care. However, constructing maximal
elements of posets turns out to be very important. For example, Zorns lemma
implies that every vector space has a basis, as we prove below. Those unfamiliar
with linear algebra can skip this example.
Proposition 6.6. Zorns lemma implies that every vector space has a basis.
We will work over R, although one could replace R with any eld. Recall that
a subset S of a vector space V is linearly independent if there do not exist vector
v
1
, . . . , v
n
S and coecients c
1
, . . . , c
n
R with c
1
v
1
+ + c
n
v
n
= 0, unless
c
1
= = c
n
= 0. A subset S spans V if for every v V , there exist v
1
, . . . , v
n
S
and c
1
, . . . , c
n
R such that v = c
1
v
1
+ +c
n
v
n
. A basis is a linearly independent
subset that spans V .
Proof. Let V be a vector space, and let P be the set of all linearly independent
subsets of V . Then P is a poset under . Given any chain C in P, let
S =
_
TC
T.
Then S is linearly independent: if v
1
, . . . , v
n
S, then there exist T
1
, . . . , T
n
C
such that v
i
T
i
. Because C is a chain, T
1
, . . . , T
n
are comparable and one of them
therefore contains the others as subsets; call it T
j
. Then v
1
, . . . , v
n
T
j
, and because
T
j
is linearly independent, c
1
v
1
+ +c
n
v
n
cannot vanish unless c
1
= = c
n
= 0.
26 HENRY COHN
Thus, S is linearly independent, so S P. It is an upper bound for the chain C, so
we have veried that every chain in P has an upper bound.
By Zorns lemma, there is a maximal element S in P. That means for every
v P S, the union Sv is not linearly independent, so there exist v
1
, . . . , v
n
S
and c
0
, . . . , c
n
R such that
c
0
v +c
1
v
1
+. . . c
n
v
n
= 0,
with not all of c
0
, . . . , c
n
equal to 0. Because S is linearly independent, we must
have c
0
,= 0, and hence
v =
c
1
c
0
v
1

c
n
c
0
v
n
.
Thus, S spans V , so it is a basis of V .
We will deduce Zorns lemma from a further lemma called Hausdors maximal
principle: every poset has a maximal chain (i.e., a chain not contained in any bigger
chain).
Proposition 6.7. Hausdors maximal principle implies Zorns lemma.
Proof. Let P be a poset in which every chain has an upper bound. By Hausdors
maximal principle, there is a maximal chain C in P. Let u be an upper bound for
C. Then u is maximal in P, because if v > u, then C v is an even larger chain.
(Note that u must be in C, because otherwise C u would already be a larger
chain than C, but v cannot be in C because it is strictly greater than each element
of C.)
Proposition 6.8. The well-ordering theorem implies Hausdors maximal principle.
Proof. Let P be a poset with ordering . By the well-ordering theorem, there exists
a well-ordering <
W
on P, which neednt have any particular relationship to the
other ordering .
Let p be the <
W
-least element of P, and dene by transnite recursion a function
f : P P by
f(x) =
_
x if x f(y) : y <
W
x is a chain in P, and
p otherwise.
Now consider the image of s. The image is a chain, because x is never added to
the image unless it is comparable with everything that came before. Furthermore,
anything not in the image was left out precisely because it was incomparable with
something already in the image, so no further elements can be added to the chain.
Thus, the image of f is a maximal chain in P.
Finally, we complete the circle by showing that Zorns lemma implies the Axiom
of Choice.
Proposition 6.9. Zorns lemma implies the Axiom of Choice.
Proof. Let S be a set of non-empty sets, and let
P = (T, f) : T S and f is a choice function on T.
Then P is a poset under the ordering dened by (T
1
, f
1
) (T
2
, f
2
) i T
1
T
2
and the restriction of f
2
to T
1
is f
1
. Think of the elements of P as partial choice
functions, dened only on subsets of S.
NOTES ON SET THEORY (18.510, FALL 2011) 27
If there is a maximal element (T, f) in P, then T = S, because otherwise we
could take s S T and extend f to T s by letting f(s) be any element of s.
(Note that we are not using the Axiom of Choice here, because we are choosing
just a single element of a non-empty set, rather than asserting the existence of a
function that simultaneously makes many choices.) Thus, by Zorns lemma, all we
need to check is that every chain in P has an upper bound.
Given a chain C in P, let T be the union of the domains of the partial choice
functions in C. Because the elements of C agree wherever more than one is dened
(this follows from comparability in P), we can dene a partial choice function f on
T by letting f(t) be the unique element of t chosen by the elements of C whose
domains include t. Then (T, f) is an upper bound in P for C. Thus, there is a
maximal element of P, and it is a choice function for S, as desired.
7. Cardinals
From this point on, we assume the Axiom of Choice. Thus, by the well-ordering
theorem, every set is in bijection with some ordinal.
Denition 7.1. An ordinal is a cardinal if it is not in bijection with any smaller
ordinal. The cardinality [S[ of a set S is the smallest ordinal that is in bijection
with S.
Every cardinal is the cardinality of some set (for example, itself), and the
cardinality of a set is always a cardinal. Of course, [[ for every ordinal .
Lemma 7.2. Let and be cardinals. Then as ordinals if and only if there
is an injective map from to , and = if and only if there is a bijection between
them.
Proof. If , then and the inclusion of in is injective. If > ,
then there cannot be an injective map from to , since otherwise combining it
with the inclusion from to would yield a bijection between and by the
Cantor-Schroder-Bernstein theorem, which would contradict the fact that is a
cardinal.
Thus, the ordering of cardinals as ordinals agrees with the ordering on cardinals
dened using injective functions.
Corollary 7.3. If and are ordinals and , then [[ [[.
Proof. If , then and hence [[ [[.
The ordering on cardinals can also be dened using surjective maps, but one
must be careful about the empty set: for ,= 0, we have if and only if there
is a surjective map from to , but there is no surjective map from a non-empty
set to the empty set. The easy lemma is as follows:
Lemma 7.4. Let S and T be sets, with S ,= . Then there is a injective map from
S to T if and only if there is a surjective map from T to S.
Proof. Given an injection f : S T, we can dene a surjection from T to S by
mapping the elements of f[S] to their pre-images and the rest of T to some arbitrary
element of S (which is why S must be non-empty). However, the other direction
depends on the Axiom of Choice: given a surjection g : T S, we can dene an
28 HENRY COHN
injection f : S T by choosing an element f(s) from the pre-image of s under
g.
By Proposition 4.11, every natural number is a cardinal. Furthermore, is a
cardinal as well:
Lemma 7.5. There is no bijection between and any natural number, and thus
is a cardinal.
Proof. Suppose there were a bijection f : n with n . Then there would be
an injective map from to n
+
, namely the composition of f with the inclusion
n n
+
, and an injective map from n
+
to , namely the inclusion n
+
. Thus,
the Cantor-Schroder-Bernstein theorem would give a bijection between and n
+
.
However, there is no bijection between n and n
+
, by Proposition 4.11.
We dene
0
to be , thought of as a cardinal. Of course, there is no logical need
to have two dierent names for the same set, but it is sometimes convenient because
it makes it clear whether we are thinking about ordinals in general or cardinals
specically.
Theorem 7.6. Every set of cardinals is well-ordered.
Of course, this theorem is an immediate corollary of Corollary 4.17, because all
cardinals are ordinals. However, it is an important enough result to be worth stating
as a theorem.
Thus, if there is a cardinal greater than
0
, then there is a least such cardinal.
However, showing that there is such a cardinal takes proof. This will follow from
Cantors theorem, which is one of the most important theorems of set theory, and
indeed of mathematics in general. The proof is a beautiful example of diagonalization.
Theorem 7.7 (Cantor). For every set S,
[S[ < [T(S)[.
It is clear that [S[ [T(S)[, for example because of the injective function mapping
s S to s T(S), but the strict inequality is not as obvious.
Proof. We will show that there cannot be a surjective function from S to T(S).
Given a function f : S T(S), let
T = s S : s , f(s).
Then T T(S), and we will show that it cannot be in the image of f. If T = f(t)
with t S, then t T if and only if t , f(t) by the denition of T, and that means
t T if and only if t , T, which is a contradiction. Thus, f is not surjective.
The motivation behind this diagonal argument is that for each s S, we want to
rule out T = f(s). We do so by arranging for them to dier in whether they contain
s. That means we want s T i s , f(s), which is the denition of T.
Proposition 7.8. There is no set of all cardinals.
Proof. For every ordinal , there is a larger cardinal, for example [T()[, and
[T()[. Thus, if the set S of all cardinals existed, then
_
S

would be the set of all ordinals, and we would run into the Burali-Forti paradox.
NOTES ON SET THEORY (18.510, FALL 2011) 29
We can measure the size of an innite cardinal by looking at the set of innite
cardinals that come before it. Specically, given any innite cardinal , consider
the set
I

= : is an innite cardinal and < .


(Note that this set exists by separation, because every such is in fact an element
of .) This is a well-ordered set and is therefore isomorphic to a unique ordinal.
Furthermore, if < then I

is an initial segment of I

, so the corresponding
ordinals are unequal, and every initial segment of I

corresponds to some innite


cardinal.
Lemma 7.9. For every ordinal , there is an innite cardinal for which I

is
isomorphic to .
Proof. If not, let be the least ordinal not isomorphic to I

for any innite cardinal


. If is a successor ordinal, with =
+
, then let be isomorphic to I

. If we
let be the least cardinal greater than , then I

is isomorphic to , which is a
contradiction. On the other hand, if is a limit ordinal, then
=
_
<
.
Taking the union of the sets I

corresponding to the ordinals < gives a


downwards closed set S of innite cardinals that is isomorphic to

<
and hence
to . It cannot contain all cardinals, by Proposition 7.8. If is the smallest innite
cardinal not contained in S, then S = I

, as desired.
Denition 7.10. If is an ordinal, then

is the unique innite cardinal such


that
: is an innite cardinal and <

is isomorphic to as a well-ordered set. We say

is a successor cardinal if is a
successor ordinal, and a limit cardinal otherwise.
This denition indexes the innite cardinals by the ordinals. People sometimes
write

for

when they wish to think of it more as an ordinal than as a


cardinal, but of course theres no mathematical distinction here, just a psychological
distinction.
We know that [T(
0
)[ >
0
, and it is natural to ask how large it is. Cantor
conjectured that [T(
0
)[ =
1
, and this conjecture is known as the continuum
hypothesis, because [T(
0
)[ is the cardinality of the continuum R. The generalized
continuum hypothesis is that [T(

)[ =

+ for all ordinals . In other words, it


says that the power set is as small as possible, subject to Cantors theorem.
Godel showed in 1938 that the generalized continuum hypothesis is consistent
with ZFC (of course assuming ZFC is itself consistent), and Cohen showed in 1963
that the negation of the continuum hypothesis is also consistent with ZFC. Thus,
the very natural question of how large [T(
0
)[ is cannot be settled using the current
axioms of set theory. Of course, we could introduce new axioms to settle it; for
example, we could even assume the generalized continuum hypothesis itself as an
axiom. However, nobody has been able to propose an intuitively compelling axiom
that resolves the continuum hypothesis.
30 HENRY COHN
8. Cardinal arithmetic
We can dene arithmetic on cardinals by imitating the ideas behind elementary-
school arithmetic.
Denition 8.1. Let and be cardinals. Then we dene
+ = [( 0) ( 1)[,
= [ [,
and

= [functions from to [.
The denition of multiplication is exactly what one would expect, but addition
and exponentiation may look a little odd. For addition, the purpose of using
0 and 1 instead of and is to ensure that the sets are disjoint. For
multiplication, one might naively expect to use functions going in the other direction,
but that would give the wrong answer if and are nite: a function from to
is given by choosing one of values at each of points, and there are

ways to
do that.
Given these denitions, we can easily apply the operations to the cardinalities of
arbitrary sets:
Lemma 8.2. For arbitrary sets S and T, we have [S T[ + [S T[ = [S[ + [T[,
[S T[ = [S[[T[, and
[functions from T to S[ = [S[
|T|
.
Note that we cannot write the rst equation in the lemma as [S T[ = [S[ +
[T[ [S T[, because there is no operation of subtraction on cardinals. For example,

0
+ 0 =
0
+ 1, but we cannot subtract
0
to conclude that 0 = 1. Similarly, there
is no division, because 1
0
= 2
0
.
All the usual algebraic laws are easily proved from basic set-theoretic properties.
For example, switching the coordinates gives a bijection between S T and T
S, so multiplication is commutative. We have S (T U) = (S T) (S
U), so multiplication distributes over addition. The most subtle cases involve
exponentiation. For example, we can prove
(

as follows. The right side is the number of functions from to , and the left side
is the number of functions from to functions from to . The bijection between
these functions is sometimes called currying. Given f : , and given y ,
let g(y) be the function from dened by (g(y))(x) = f(x, y). Then g is a
function from to functions from to . Conversely, given such a function g, we
can dene f by f(x, y) = (g(y))(x), and this gives the desired bijection.
It also follows easily from the denitions that + , , and

are (weakly)
monotonic in both and .
Lemma 8.3. If and are nite, then so are +, , and

.
For + , this follows immediately from Proposition 2.6. The remaining two
cases can be proved by a similar induction, with each case depending on the previous
one.
NOTES ON SET THEORY (18.510, FALL 2011) 31
Addition and multiplication turn out to be as simple as possible for innite
cardinals: they just give the maximum of their arguments. This means cardinal
arithmetic lacks the richness of number theory, but on the positive side it means it
is often quite easy to compute cardinalities of innite sets, because taking unions or
Cartesian products never complicates anything.
Theorem 8.4. Let and be cardinals, with and innite. Then + =
and if ,= 0, then = .
Proof. We will prove that
2
= . Then the desired results follow from
+ + = 2
2
=
and

2
= .
Suppose is the rst innite cardinal such that
2
,= . To compute
2
, we will
start by well-ordering . Specically, we set (
1
,
1
) < (
2
,
2
) if
(1) max(
1
,
1
) < max(
2
,
2
), or
(2) max(
1
,
1
) = max(
2
,
2
), and
1
<
2
, or
(3) max(
1
,
1
) = max(
2
,
2
),
1
=
2
, and
1
<
2
.
In other words, we order rst by the maximum of the two coordinates, and then
rene that ordering lexicographically. It is straightforward to check that is
then well-ordered, using the fact that ordinals are well-ordered.
Now there exists a unique ordinal such that

= . We will prove that
, from which it follows that [ [ = [[ and hence
2
= .
Suppose, by way of contradiction, that > . Then is an initial segment of ,
so this means there is an initial segment of that has cardinality . To rule
this out, we will compute the sizes of the initial segments of .
Consider the initial segment consisting of all the elements of that are less
than (, ). Without loss of generality we can assume = , because replacing the
lesser of the two coordinates with the greater will only make the initial segment
larger. Under our ordering on , the initial segment cut out by (, ) consists
of all pairs (, ) with , and (, ) ,= (, ). In other words, it is the set
(
+

+
) (, ). Thus, the initial segment has size at most [
+
[
2
.
Because is a cardinal and is an ordinal with < (i.e., ), we have
[[ < . Furthermore, [
+
[ = [[ if is innite, and either way [
+
[ < . Finally,
because we assumed was the least cardinal such that
2
, = , we must have
[
+
[
2
= [
+
[ < . Thus, every initial segment of has cardinality strictly less
than , and we conclude that
2
= . Thus, every innite cardinal equals its own
square, as desired.
On the other hand, cardinal exponentiation is much more subtle than addition
or multiplication. For every set S, we have
[T(S)[ = 2
|S|
,
because specifying a function from S to the set 2 = 0, 1 is the same as specifying
the subset of S on which it takes the value 0. The fact that ZFC does not even
determine whether 2
0
=
1
means we will not be able to compute many exponentials
explicitly. However, one can say something.
Lemma 8.5. If and are cardinals satisfying 2 2

and is innite, then

= 2

.
32 HENRY COHN
Proof. Because 2 2

, we have
2

(2

.
Because is innite,
(2

= 2

2
= 2

.
Thus,

= 2

, as desired.
We can naturally extend cardinal arithmetic to innite sums and products. For
sums one simply takes innite unions; for products, one must dene innite Cartesian
products.
Denition 8.6. Given a family of sets S
i
indexed by i I, their Cartesian product
is dened by

iI
S
i
=
_
functions f from I to
_
iI
S
i
such that f(i) S
i
for all i I
_
.
Strictly speaking, when [I[ = 2 this denition conicts with our earlier denition
of the Cartesian product, but it does not matter because the new denition still
satises Lemma 1.5. Of course, we could not have used the new denition earlier,
because we had to dene Cartesian products before we could dene functions.
Denition 8.7. Given cardinals
i
for i I, we dene

iI

i
=

_
iI

i
i

and

iI

i
=

iI

.
Note that the reason for the unconventional symbol

for the Cartesian product,


instead of

, is that the latter would lead to the cryptic denition

iI

i
=

iI

,
in which the two products would mean dierent things.
The Axiom of Choice is equivalent to the fact that an innite product of nonzero
cardinals is always nonzero: an element of

iI
S
i
encodes exactly the same
information as a choice function for S
i
: i I.
As in the case of nite sums and products, it is easy to justify various arithmetic
laws. For example, if and are cardinals, then

i
=
and

i
=

.
A slightly less trivial identity is

i
=

.
NOTES ON SET THEORY (18.510, FALL 2011) 33
To see why it is true, note that

i

j
for all j , and this implies

.
On the other hand,

=
2

.
One of the most important results about innite sums and products is Konigs
theorem:
Theorem 8.8 (K onig). Let
i
and
i
be cardinals for i I, with
i
<
i
for all i.
Then

iI

i
<

iI

i
.
Part of why K onigs theorem is remarkable is the strict inequality in the conclusion.
By contrast,
i
<
i
immediately implies that

iI

iI

i
,
but strict inequality does not always hold. For example, if I is innite,
i
= 1, and

i
= 2, then we get [I[ on both sides of the equation.
Proof. Let S
i
and T
i
be any sets with [S
i
[ =
i
and [T
i
[ =
i
. We will prove that
there is no surjective function from
_
iI
S
i
to

iI
T
i
,
which implies that

_
iI
S
i

<

iI
T
i

because the Cartesian product is non-empty. K onigs theorem is simply the special
case in which the sets S
i
are disjoint, but of course allowing them to overlap does
not really strengthen the result.
Let f :

iI
S
i

iI
T
i
be any function. We want to construct an element x
of

iI
T
i
that is guaranteed not to be in the image of f. For each i I, we will
choose the i-th coordinate of x to ensure that it is not in f[S
i
].
Specically, let
i
:

jI
T
j
T
i
be the projection onto the i-th coordinate.
Then
[
i
[f[S
i
]][ [S
i
[ < [T
i
[,
so there exists an element x
i
T
i

i
[f[S
i
]]. This means x
i
cannot occur as the
i-th coordinate of any element of f[S
i
].
Now dene x

iI
T
i
to have such an element as its i-th coordinate x
i
for all
i. Then for all i, x cannot be in f[S
i
], so it is not in the image of f at all. Thus, f
is not surjective, as desired.
34 HENRY COHN
Cantors theorem is an immediate corollary of K onigs theorem: if we take
i
= 1
and
i
= 2 for all i I, then we nd that
[I[ =

iI
1 <

iI
2 = 2
|I|
.
9. Cofinality
Denition 9.1. A subset S of a partially ordered set P is conal if for all x P,
there exists an s S such that x s.
In other words, no matter how long P continues, there is always an element of S
coming up. The name conal captures this idea by saying that S is equally nal
compared with P.
We will be particularly interested in measuring the sizes of conal subsets of
ordinals, in particular by their order types. Recall that the order type of a well-
ordered set is the unique ordinal isomorphic to it.
Denition 9.2. The conality cf() of an ordinal is the smallest possible order
type of a conal subset of .
Note that because ordinals and hence order types are themselves well-ordered,
conality is well-dened.
Clearly, cf() , because is a conal subset of itself, but it can be much
smaller. For example, cf(
+
) = 1, because the subset is conal in
+
. Conality
also satises cf(cf()) = , because if cf(cf()) < , then one could replace a conal
subset S of that has order type cf() with a conal subset of S that has order
type cf(cf()) to lower the conality of (a conal subset of a conal subset is
itself conal).
Lemma 9.3. Let and be ordinals. If there is a function from to with
conal image, then cf() .
Proof. Given a function f : with conal image, dene
S = : f() > f() for all < .
Then S and f[
S
is an isomorphism from S to f[S], because S is chosen so that
f[
S
is strictly order-preserving.
Furthermore, f[S] is conal in f[], because for each element f() in f[], the
rst such that f() f() is in S. Thus, f[S] is conal in , so there is a
conal subset of with order type type(S). However, type(S) type() = , by
Proposition 5.14. Thus, cf() , as desired.
Corollary 9.4. For every ordinal , its conality cf() is a cardinal.
Proof. By Lemma 9.3, if there is a bijection between cf() and an ordinal , then
cf().
This means we can think of cf() as measuring the size of the smallest conal
subset of in either of two ways, by order type or by cardinality, and they give the
same answer.
Denition 9.5. A cardinal is regular if = cf(), and singular otherwise.
NOTES ON SET THEORY (18.510, FALL 2011) 35
For example,
0
is regular, since every nite subset of
0
has a maximal element
and is therefore not conal. In fact, the same is true of every successor cardinal,
which justies the implication of the word regular that most cardinals are regular:
Proposition 9.6. For every ordinal , the successor cardinal

+ is regular.
Proof. If S

+ is conal, then

+ =
_
S
,
by Corollary 4.22. Thus,

S
[[

[S[.
Thus, [S[

+, since otherwise

+
2

However, not every innite cardinal is regular. For example,

is a singular
cardinal, because the subset
i
: i is conal. In fact, we will see shortly that
cf(

) = cf() for every limit ordinal .


Lemma 9.7. If is a limit ordinal, is any ordinal, and there is a strictly
increasing function from to with conal image, then cf() = cf().
Proof. Let f : be strictly increasing. By composing f with a map cf()
that is an isomorphism of cf() with a conal subset of , we get a map cf() ,
and it has conal image because f is increasing. Thus, by Lemma 9.3, cf() cf(),
so we need only prove that cf() cf().
Suppose T is conal, with [T[ = cf(). Given t T, let
g(t) = least s such that f(s) > t;
to see that this denition makes sense, we must verify that there is such an s at
all. Because f[] is conal in , for every t T there exists s

such that
f(s

) t. Because is a limit cardinal, there exists s such that s > s

, and
then f(s) > f(s

) t because f is strictly increasing. Thus, g(t) is well dened.


Let
S = g(t) : t T.
Then [S[ [T[ = cf(), and we will show that S is conal in . By Corollary 9.4,
this is enough to conclude that cf() cf().
If S is not conal in , then there exists x such that x s for all s S. It
follows that f(x) f(s) for all s S, and hence t < f(g(t)) f(x) for all t T.
This contradicts the conality of T in , so we conclude that S is conal in and
thus that cf() = cf().
Corollary 9.8. If is a limit ordinal, then cf(

) = cf().
Proof. Apply Lemma 9.7 to the map from to

dened by

.
36 HENRY COHN
Proposition 9.9. Let be an innite cardinal. Then cf() is the least cardinality
[I[ of a set I for which there are cardinals
i
for i I such that
i
< and

iI

i
= .
In other words, the conality of an innite cardinal is the smallest number of
summands less than that can add up to .
Proof. First, we will show that we can write as a sum of cf() cardinals that are
less than . Let
i
: i cf() be a conal subset of . Then
=
_
icf()

i
,
by Corollary 4.22. Because
i
and is a cardinal, [
i
[ < . Thus,

icf()
[
i
[ cf() = ,
so

icf()
[
i
[ =
with [
i
[ < , as desired.
For the other direction, suppose [I[ < and
=

iI

i
for some cardinals
i
with with
i
< . If
i
: i I is not conal in , then there
exists an ordinal with
i
for all i, and hence
i
= [
i
[ [[ < . Then
=

iI

i
[I[ [[ < ,
which is a contradiction. Thus,
i
: i I must be conal in , so
[I[ [
i
: i I[ cf().

We can now formulate two of the most important consequences of Konigs


theorem:
Theorem 9.10. For every innite cardinal and cardinal 2,

cf()
>
and
cf(

) > .
Note that the second part is a strengthening of Cantors theorem, since it implies
that

> .
Proof. By Proposition 9.9, we can write
=

icf()

i
NOTES ON SET THEORY (18.510, FALL 2011) 37
with
i
< . By Konigs theorem,

icf()

i
<

icf()
,
which amounts to <
cf()
.
For the second part, note that

is innite because 2. If cf(

) , then

< (

)
cf(

)
(

2
=

,
which is a contradiction. Thus, cf(

) > .
It turns out that Theorem 9.10 tells us everything ZFC has to say about 2

when
is regular. Stated somewhat informally, we have the following theorem:
Theorem 9.11 (Easton). Under ZFC, the only constraints on 2

for regular, innite


cardinals are
< cf(2

)
and
2

.
In other words, for any denable function f on any set of ordinals with

regular, it is consistent with ZFC that


2

=
f()
,
provided that f is weakly increasing and that cf(
f()
) >

.
Arbitrarily weird things can happen. For example, it is consistent with ZFC that
2
i
=
i+1
for 0 i 10, 2
i
=
592
for 11 i 591, and 2
i
=
i+1
for i 592,
and of course there are far weirder possibilities than this.
It is consistent with ZFC that 2
0
=

+, in which case 2
0
>

, or that
2
0
=
1
, in which case 2
0
<

. Of course, it is impossible to have 2


0
=

,
because of the conality constraint.
Eastons theorem tells us nothing about 2

when is singular. Many logicians


at rst assumed this was just a defect of the proof, and that in fact the regularity
assumption could be dropped, but Silver showed that it cannot:
Theorem 9.12 (Silver). If

is a singular cardinal with uncountable conality,


and if 2

+ for all < , then 2

+.
In other words, a singular cardinal with uncountable conality cannot be the
rst counterexample to the generalized continuum hypothesis, so ZFC imposes some
additional constraints on 2

with singular. These constraints are still not fully


understood, but there are some remarkable results, such as this theorem of Shelah:
Theorem 9.13 (Shelah). If 2
i
<

for all i , then


2

<
4
.
The upper bound of
4
is truly enormous (
4
is vastly larger than ), but it is
noteworthy that there is any upper bound at all. Of course, we need some hypothesis
along the lines given in Shelahs theorem. For example, it is consistent with ZFC
that 2
0
>
4
, in which case of course 2

>
4
. Nobody knows whether the
bound in Shelahs theorem can be improved to
3
.

You might also like