The Theory of The Foundations of Mathematics

The theory of the foundations of mathematics
- 1870 to 1940 -
Mark Scheffer
(Version 1.0)
2
3
Mark Scheffer, id. 415968, e-mail: zax@chello.nl. Last changes:

March 22, 2002. This report is part of a practical component of the Com-
puting Science study at the Eindhoven University of Technology.
4
To work on the foundations of mathematics, two things are needed:

Love and Blood.
- Anonymous quote, 2001.

Contents
1 Introduction 9
2 Cantor’s paradise 13
2.1 The beginning of set-theory . . . . . . . . . . . . . . . . . . . 13
2.2 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Mathematical constructs in set-theory 21

3.1 Some mathematical concepts . . . . . . . . . . . . . . . . . . . 21
3.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Induction Methods . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.2 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 The principle of induction . . . . . . . . . . . . . . . . 34
3.5 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Dedekind’s cuts . . . . . . . . . . . . . . . . . . . . . . 46
3.5.2 Cantor’s chains of segments . . . . . . . . . . . . . . . 47
3.5.3 Cauchy-sequences . . . . . . . . . . . . . . . . . . . . . 48
3.5.4 Properties of the three definitions . . . . . . . . . . . . 50
3.6 Infinite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . 60
3.8 Cardinal and Ordinal numbers and Paradoxes . . . . . . . . . 63
3.8.1 Cardinal numbers and Cantor’s Paradox . . . . . . . . 63
3.8.2 Ordinal numbers and Burali-Forti’s Paradox . . . . . . 65
4 Peano and Frege 71

4.1 Peano’s arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Frege’s work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5
6 CONTENTS
5 Russell 79
5.1 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Consequences and philosophies . . . . . . . . . . . . . . . . . 88
5.3 Zermelo Fraenkel . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . 92
5.3.2 Zermelo Fraenkel (ZF) Axioms . . . . . . . . . . . . . 93
6 Hilbert 99
6.1 Hilbert’s proof theory . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . . . . . . . 110
7 Types 113
7.1 Russell and Whitehead’s Principia Mathematica . . . . . . . . 113
7.2 Ramsey, Hilbert and Ackermann . . . . . . . . . . . . . . . . . 119
7.3 Quine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8 Gödel 123
8.1 Informally: Gödel’s incompleteness theorems . . . . . . . . . . 123
8.2 Formally: Gödel’s Incompleteness Theorems . . . . . . . . . . 127
8.2.1 On formally undecidable propositions . . . . . . . . . . 127
8.2.2 The impossibility of an ‘internal’ proof of consistency . 130
8.2.3 Gödel numbering and a concrete proof of G1 , G2 and G3 131
8.3 Gödel’s theorem and Peano Arithmetic . . . . . . . . . . . . . 132
8.4 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5 Neumann-Bernays-Gödel axioms . . . . . . . . . . . . . . . . . 135
9 Church and Turing 141

9.1 Turing and Turing Machine . . . . . . . . . . . . . . . . . . . 141
9.2 Church and the Lambda Calculus . . . . . . . . . . . . . . . . 153
9.3 The Church-Turing thesis . . . . . . . . . . . . . . . . . . . . 166
10 Conclusion 169
A Timeline and Images 181

CONTENTS 7
Mathematical Notations
Many different notations have been developed for set theory and logic.
Most notations that we have used are standard today; other notations that
we have used are introduced in the text.
Mathematical Logic
symbol meaning also described as
∧ conjuction and
∨ disjunction (inclusive) or
¬ negation not
ϕ(x) propositional function
→ implication if . . . then
↔ bi-implication if and only if, iff
≡ equivalence is equivalent to
∀ universal quantifier for all
∃ existential quantifier exists
∃! one-element existential quantifier exists a unique
In most places we have chosen to use the following notation1 to denote

quantifications:
(relation : range : term)

denotes the relationship over a set of terms ranging over range
Consider a general pattern (Q x : ϕ(x0 , . . . , xn ) : t(x0 , . . . , xn )), with Q
a quantifier, ϕ a boolean expression in terms of the dummies x0 , . . ., xn ,
and t(x0 , . . . , xn ) the term of the quantification. The quantification is the
accumulation of values t(x0 , . . . , xn ) using an operator or relation indicated
by Q, over all values (x0 , . . . , xn ) for which ϕ(x0 , . . . , xn ) holds.
1
Notation originally due to E.W. Dijkstra.
8 CONTENTS
This notation is suitable for formal manipulation and unambiguous in the

sense that it explicitly indicates the quantifier Q, the dummies and the range
of the dummies that is indicated by the boolean expression ϕ (i.e. it exactly
determines the domain of the quantification). This allows us to reason about
general properties of quantifications, in a way in which the (scopes of the)
bound variables are clearly identified. Note that this type of quantification
is only suitable for binary operations that are symmetric and associative.
Example:

( x : 0 ≤ x ≤ 5 : x2 )
=
02 + 12 + 22 + 32 + 42 + 52
=

5
x2
x=0
Example:
(∃x : x ∈ N : x3 − x2 = 18)
≡
‘there exists a natural number x such that x3 − x2 = 18’
If the term ranges over all possible values of the variable (here : x), or if
it is clear what the range of a variable is, we can omit it.
Example:
(∀x : true : x ∈ A → x ∈ B)
≡
(∀x :: x ∈ A → x ∈ B)
≡
‘all elements of A are also elements of B’
Chapter 1
Introduction
Pure mathematics is, in its way, the poetry of logical ideas.
- Albert Einstein
This report covers the most important developments and theory of the
foundations of mathematics in the period of 1870 to 1940. The tale of the
foundations is fairly familiar in general terms and for its philosophical con-
tent; here the main emphasis is laid on the mathematical theory. The history
of the foundations of mathematics is complicated and is a many-sided story;
with this article I do not aim to give a definitive or complete version, but
to capture what I consider the essence of the theoretical developments, and
to present them in a clear and modern setting. Some basic mathematical
knowledge on set-theory and logics are presupposed.
By the middle of the nineteenth century, certain logical problems (for

example paradoxes around the notions of infinity, the infinitesimal and con-
tinuity) at the heart of mathematics had inspired a movement, led by German
mathematicians, to provide mathematics with more rigorous foundations.
This is where the theory of this report begins, with the emergence of set
theory by the German mathematician Cantor. In section 2.1 we informally
describe how work on a problem concerning trigonometric series gradually
led Cantor to his theory of sets (section 2.2). As a result of the work of
Weierstrass, Dedekind and Cantor, pure mathematics had been provided
with much more sophisticated foundations. The notion of infinitesimal had
been banished, ‘real’ numbers had been provided with a logically consistent
9
10 CHAPTER 1. INTRODUCTION
definition (section 3.5), continuity had been redefined and, more controver-
sially, a whole new branch of arithmetic had been invented which addressed
itself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7).
In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish
but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly
different form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti could
not resolve this paradox, but it was not taken so seriously, partly because
the paradoxes appeared in a rather technical region.
The Italian mathematician Peano (section 4.1) was able to show that the
whole of arithmetic could be founded upon a system that uses three basic
notions and five initial axioms. At the same time the German mathematician
Frege (section 4.2) worked on developing a logical basis for mathematics. Just
as Peano, Frege wanted to put mathematics on firm grounds. But Frege’s
grounds were strictly logic; he followed a development later called logicism,
also known as the development of so-called mathematical logic.
The British mathematician Russell noted Peano’s work and later that
of Frege. Soon thereafter he showed (section 5.1) how finite descriptions
like ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointed
out the difficulties that arose with self-referential terms. This paradox that
Russell found existed not only in specific technical regions but in all of the
axiomatic systems underlying mathematics at the same time (section 5.1).
But since the paradoxes could be avoided in most practical applications of
set theory, the belief in set theory as a proper foundation of mathematics
remained. Axiomatic set theory (section 5.3.1) was an attempt to come to
a theory without paradoxes. Various responses to the paradox (section 5.2)
led to new sets of axioms for set theory. The two main approaches are by the
German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun-
garian von Neumann, the Hungarian-Austrian Gödel and the Briton Bernays
(section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of
mathematics by the Dutch mathematician Brouwer (not covered here) and
to a theory of types, proposed by Russell himself with the help of his for-
mer teacher, the English mathematician Whitehead. Despite of the paradox
Russell and Whitehead still claimed that all mathematics could be founded
on a mathematical logic; this believe was given a definite presentation in
their work ‘Principia Mathematica’ (section 7.1). Various consequences fol-
lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and
11
Ramsey, see section 7.2).
At the turn of the century, the German mathematician David Hilbert

listed certain important problems concerning the foundations of mathema-
tics and mathematics in general (section 6.2. To overcome paradoxes and
other problems that arose in existing systems, Hilbert developed a theory of
axiomatic systems (section 6.1). He then stimulated his student Zermelo in
using this axiomatic method to develop as first a set of axioms for set theory
(section 5.3.2). Hilbert had since then made more precise demands on any
proposed set of axioms for mathematics (section 6.1) in terms of consistency,
completeness and decidability.
In 1931 Gödel had shown that consistency and completeness could not
both be attained (chapter 8). Gödel’s work left outstanding Hilbert’s ques-
tion of decidability. The English mathematician Turing proved in 1936 that
there are undecidable problems, by giving the so-called halting problem that
cannot be solved by any algorithm (section 9.1), after formalizing the no-
tion of algorithm with his concept of the Turing Machine. The American
mathematician Church (independently) obtained the same result but with
another formalization of the notion of an algorithm, using his computational
model of lambda calculus (section 9.2). In section 9.3 we state that these two
notions are equivalent and correspond to the intuitive notion of algorithm or
computability. In chapter 10 I summarize the theory of the foundations of
mathematics, before giving my own opinion and make some suggestions for
future work.
This article is part of the practical component of my study of computing

science, and written for a large part in 8 weeks at the Heriot-Watt university
in Edinburgh under supervision of prof. F. Kamareddine. I want to thank
Rob Nederpelt and the formal methods section of the computing science de-
partment of the Eindhoven University of Technology for making this possible.
Rob Nederpelt always inspired me to continue working on this report and was
patient in explaining difficult proofs to me. And last but not least, I want
to thank Fairouz Kamareddine for her support and positive motivation, and
Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last
revision) for the typesetting and editing of large parts of this document and
for helping me in many ways to finish this article in such a small period of
time.
12 CHAPTER 1. INTRODUCTION
Chapter 2
Cantor’s paradise
2.1 The beginning of set-theory

Perhaps the most surprising thing about mathematics is that it
is so surprising. The rules which we make up at the beginning
seem ordinary and inevitable, but it is impossible to foresee their
consequences. These have only been found out by long study, ex-
tending over many centuries. Much of our knowledge is due to a
comparatively few great mathematicians such as Newton, Euler,
Gauss, or Riemann; few careers can have been more satisfying
than theirs. They have contributed something to human thought
even more lasting than great literature, since it is independent of
language.
- Titchmarsh, E. C. in [88]
By the late 19th century the discussions about the foundations of geometry
had become the focus for a running debate about the nature of the branches
of mathematics ([23, last paragraph of section 35, page 69/70]). Although
there had been no conscious plan leading in that direction, the stage was set
for a consideration of questions about the fundamental nature of mathema-
tics.
In the study of logic, the work of the English mathematician George Boole
in the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders
13
14 CHAPTER 2. CANTOR’S PARADISE
Peirce around 1880 ([49, page 187]), had contributed to the development of a
symbolism to explore logical deductions and in Germany the logician Gottlob
Frege (see [98]) had directed keen attention to fundamental questions.
All of these debates came together through the pioneering work of the
German mathematician Georg Cantor on the concept of a set. Cantor had
begun work in this area because of his interest in Riemann’s theory of trigono-
metric series.
In Germany at the university of Halle, the direction of Cantor’s research

turned away from number theory and towards analysis. This was due to
Heine, one of his senior colleagues at Halle, who challenged Cantor to prove
the open problem on the uniqueness of representation of a function as a
trigonometric series (see [30, section 5.2, page 182]). Starting from the work
on trigonometric series and on the function of a complex variable done by
the German mathematician Bernhard Riemann (see [75]) in 1854, Cantor in
1870 showed ([30, page 182]) that such a function can be represented in only
one way by a trigonometric series. Consideration of the collection of numbers
(originally termed ‘point sets’, see [30, section 5.2, page 184]) that would not
conflict with such a representation led him, first, in 1872, to define irrational
numbers in terms of convergent sequences of rational numbers (or quotients
of integers, see section 3.5.2) and then to begin his major lifework, the theory
of sets and the concept of transfinite numbers.
2.2. BASIC CONCEPTS 15
2.2 Basic concepts

The essence of mathematics lies in its freedom.
- Georg Cantor, quoted in [58]
In 1974 Cantor published his first article on set-theory. A set, wrote Can-
tor (in ‘Untersuchungen über die Grundlagen der Mengenlehre I’, published
in [20, page 261-281]), is “a collection of definite, distinguishable objects of
perception or thought conceived as a whole”. In this report we use a similar
description of the concept of a set.
What is a set? A (finite or infinite) collection of objects, that is considered

as a single, abstract object.
A set is sometimes also called aggregate, class or (as it was first called by
Riemann (see [31, page 88]) and later by the mathematician Russell:) mani-
fold . The objects are also called elements or members of the set.
We denote a set of elements between brackets ‘{’, ’}’, and membership of

an element to a set by the membership relation ∈.
Example: If we consider a set that contains natural numbers, we write 4 ∈

{2, 3, 4, 5} to indicate that 4 is an element of the set {2, 3, 4, 5}. We write
4
∈ {7, 8, 9} to indicate that 4 is not an element of the set {7, 8, 9}.
In a mathematical context we mostly consider sets of numbers and functions.

We denote the well-known sets of natural numbers by N (this set is also called
the naturals), the integers by Z, the fractional numbers by Q (this set is also
called the rationals) and the reals by R (this set is also called the continuum).
The objects of a set themselves can also be sets.
What is set theory? A branch of mathematics that deals with the proper-
ties of well-defined collections of objects, which may be of a mathematical
nature, such as numbers or functions, or not.
Cantor defined ([49, page 288]) two sets A and B to be identical (equal),
notation A = B, if and only if A and B have the same elements. When later
set-theory was axiomatized, this definition became also known as the
Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B))
Example: {3, 3, 7} = {7, 3} and {2, {3, 4}}

= {{2, 3}, 4}
The relation ‘is a subset of’, notation ⊆, indicates that one set is con-
tained in the other:
Definition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B)
Definition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A

= B)
We often want to create a new set from a given set by selecting elements
that have certain properties. For example we take the set of powers of three
or the set of all even numbers (to be exact: the set containing those ele-
ments of the set of natural numbers that have the property to be divisible
by 2). This principle was used by Cantor, and we also call it the unrestricted
or naive comprehension principle because it later (see sections 3.8 and 5.1)
turned out to be untenable.
Comprehension principle: For all properties ϕ there is precisely one set,

denoted by {x | ϕ(x)}, whose elements are exactly those objects which have
the property ϕ.
We thus have that y ∈ {x | ϕ(x)} ↔ ϕ(y). As a consequence (by taking

for all x, ϕ(x) = false), there is at least one set that has no elements: the
empty set, denoted by ∅.
Theorem: (∃!x :: (∀y :: y ∈ / x))

Proof: If we take ϕ to be false, the comprehension principle says that ‘there
is precisely one set whose elements are exactly those objects which have the
property false’. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)).
This is equivalent to saying there is no element y that can be a member of
x: (∃!x :: (∀y :: y ∈
/ x)). From now on, we denote this unique set x by ∅ and
call it the empty set.
Corollary: (∀a :: ∅ ⊆ a)
Proof: We want to prove that (∀a :: ∅ ⊆ a) or, using the definition of the
subset relation: (∀x :: x ∈ ∅ → x ∈ a). From the previous theorem we know
that (∀y :: y ∈
/ ∅). This yields us (∀x :: false → x ∈ a), which is true.
Using the comprehension principle we can create new sets from given sets.
So now we can introduce some operations on sets, by applying the compre-
hension principle. But before we do that, we first introduce some general
(i.e. regardless whether the operations are set-theoretic or not) properties
of operations: idempotence, commutativity, associativity and distributivity.
Although Cantor did not formulate these properties as such, they are used
in the branch of calculus and useful in the set theory that follows in this
chapter.
Suppose ⊕ and are binary1 operations on a certain domain and E, F and
G are elements on that domain (for example sets), on which we have defined
the equality relation ‘=’.
Definition of idempotence:
⊕ is idempotent := (∀E :: E ⊕ E = E)
Definition of commutativity:
⊕ is commutative := (∀E, F :: E ⊕ F = F ⊕ E)
Definition of associativity:
⊕ is associative := (∀E, F, G :: (E ⊕ F ) ⊕ G = E ⊕ (F ⊕ G))
Definition of distributivity:
⊕ is distributive2 over := (∀E, F, G :: E ⊕ (F G) = (E ⊕ F ) (E ⊕ G))
1
These properties can also be generated for operations of arbitrary arity, but this will
not be necessary for our discussion.
2
This form of distributivity is also called left-distributivity, as opposed to right-
distributivity.
⊕ is right-distributive over := (∀E, F, G :: (E F ) ⊕ G = (E ⊕ G) (F ⊕ G))
In ordinary mathematics this distinction is often left out for commutative operations, and
we for example simply say that × is distributive over + (when in fact it is both left- and
right-distributive).
The symbol ∪ is employed to denote the union of two sets. Thus, the set
A ∪ B is defined as the set that consists of all elements belonging either to
set A or set B.
Definition of union: A ∪ B := {x | x ∈ A ∨ x ∈ B}
The intersection operation is denoted by the symbol ∩. A ∩ B is defined

as the set composed of all elements that belong to both A and B.
Definition of intersection: A ∩ B := {x | x ∈ A ∧ x ∈ B}
Any two sets the intersection of which is the empty set are said to be dis-
joint. A collection of sets is called (pairwise) disjoint or mutually exclusive
if any two distinct sets in it are disjoint.
Example: The operations union and intersection on sets are both idempo-
tent, commutative and associative.
The difference of sets B and A, denoted B − A, contains those elements

of B, that are not in A.
Definition of difference: B − A := {x | x ∈ B ∧ x ∈
/ A}
If A ⊆ B we often call the difference B − A the relative complement of A

in B. We then call B the universe, and if it is clear what the universe is we
often denote the relative complement of A by Ac . From the definitions that
we have introduced so far, we can deduce three properties that are known as
the laws of reciprocity. The second and third law are also known as the laws
of de Morgan, named after the English mathematician Augustus de Morgan:
First law of reciprocity: A ⊆ B ↔ AC ⊇ B C

Second law of reciprocity: (A ∪ B)C = AC ∩ B C
Third law of reciprocity: (A ∩ B)C = AC ∪ B C
We define the power set of V , denoted by P(V ), as the set of all subsets
of V . Note that if V
= ∅, this operation creates a larger set from a given set
V.
Definition of powerset: P(V ) := {A | A ⊆ V }
Given a set V , we thus have that (∀y :: y ∈ P(V ) ↔ y ⊆ V )
We can extend the union of a pair of sets to any finite collection of sets;
the union is then defined as the set of all objects which belong to at least
one set in the collection A. We can do the same for the intersection.

Definition: A := {x | (∃y :: y ∈ A ∧ x ∈ y)}
Definition: A := {x | (∀y :: y ∈ A → x ∈ y)}
We can divide a set of objects into a partition, that is a family of subsets

that are mutually exclusive and jointly exhaustive. Assume P is a set of
subsets of X.
Definition
of partition: P is a partition of X :=
X = {A | A ∈ P } ∧ (∀A, B : A, B ∈ P : A = B ∨ A ∩ B = ∅)
In this chapter I have made extensive use of [30] in section 2.1 and [17]
in section 2.2.
Chapter 3
Mathematical constructs in
set-theory
3.1 Some mathematical concepts

The mathematician is entirely free, within the limits of his imagi-
nation, to construct what world he pleases. What he is to imagine
is a matter for his own caprice; he is not thereby discovering the
fundamental principles of the universe nor becoming acquainted
with the ideas of God. If he can find, in experience, sets of entities
which obey the same logical scheme as his mathematical entities,
then he has applied his mathematics to the external world; he has
created a branch of science.
- J.W.N. Sullivan in Aspects of Science, 1925
Now that we have this apparatus of set-theory available, we will see that
it is not just a separate branch of mathematics, but that we can define some
basic mathematical constructs in set-theory. In this section we will consider
pairs and the cartesian product, necessary before we can treat relations (in
section 3.2) and functions (in section 3.3).
First we consider the mathematical concept of an ordered pair < a, b >.

Compared to a ‘normal’ pair, where two pairs are considered equal if they
have the same elements, we want an ordered pair to also have the property
21
22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
that the elements appear in the same order:
(∀c, d :: < a, b > = < c, d > ↔ a = c ∧ b = d)
We can now easily verify that the following definition (see [17, chapter
8]) in set-theory satisfies the desired property.
Definition of ordered pair1 : < a, b > := {a, {a, b}}
As the cartesian product A × B is by definition the set of all ordered

pairs < a, b > with a ∈ A and b ∈ B, we can now use the same definition in
set-theory:
Definition of cartesian product: A × B := {< a, b > | a ∈ A ∧ b ∈ B}
Let V = {Vi | i ∈ I} be a set of sets. We now define the cartesian product

of a set of sets, denoted by ×V or ×i∈I Vi . The definition uses the concept of
a function, that will be introduced on page 29.
Definition of cartesian
product of a set of sets:
×V := {f : I → i∈I Vi | (∀i : i ∈ I : f (i) ∈ Vi )}
1
Representation originally by Kuratowski, see [49, page 294].
3.2. RELATIONS 23
3.2 Relations
Mathematicians do not study objects, but relations between ob-
jects. Thus, they are free to replace some objects by others as
long as the relations remain unchanged. Content to them is irre-
levant: they are interested in form only.
- J.H. Poincaré
In mathematics, a relation maps each element from an input set (called

domain) to either true or false. We formalize this notion in set-theory.
Definition of binary relation:

R is a binary relation between X and Y := R ⊆ X × Y
Note: We can easily generalize this definition for n-ary relations: R is an

n-ary relation on X1 , . . . Xn := R ⊆ X1 , X2 × . . . × Xn , for n ∈ N. We call n
the arity of the function.
Example: We have already seen the definitions of the subset and proper sub-
set relations in section 2.1. There we defined the set R ⊆ X ×Y implicitly by
using a statement; only those pairs < x, y > are in R for which the statement
holds (here we are using in fact the comprehension principle of page 16). We
will continue to use statements to define relations.
We define the following shorthand notation (sometimes also written in

infix notation as xR y): R(x, y) := < x, y > ∈ R.
The mathematical expression ‘x < y’ is now equivalent to the set theoretic

expression ‘< x, y > ∈ R’, with R representing the ‘less than’ relation.
Example: The relation < on the naturals (i.e. between N and N) can be
defined as:
< 0, 1 >, < 1, 2 >, < 2, 3 >, . . .

< 0, 2 >, < 1, 3 >, < 2, 4 >, . . .
< 0, 3 >, < 1, 4 >, < 2, 5 >, . . .
..
.
On a relation R we can define the concepts of domain and range.
Definition of domain, range:

dom(R) := {x ∈ X | (∃y : y ∈ Y : R(x, y))}
ran(R) := {y ∈ Y | (∃x : x ∈ X : R(x, y))}
If we define the identity relation of X, we want it to have the usual pro-

perty that idX (x) = x for all x ∈ X (see for example [3, section 1.9.5.b, page
30]). In set-theory, we denote the identity relation on V by IV .
Definition of identity relation: IV := {< x, y > ∈ V × V | x = y}
Assume R is a binary relation on a set X (i.e. R ⊆ X × X). As we did

for operations in section 2.2, we can also define some general properties of
relations. Note that we have already defined an equality relation ‘=’ on X at
page 16. Hereby we can explicitly state on which domain the property holds
(e.g. R is reflexive on X) or leave this implicit (e.g. simply R is reflexive).
Definition of reflexivity:
R is reflexive := (∀x : x ∈ X : R(x, x))
Definition of symmetry:
R is symmetric := (∀x, y : x, y ∈ X : R(x, y) → R(y, x))
Definition of anti-symmetry:
R is anti-symmetric := (∀x, y : x, y ∈ X : R(x, y) ∧ R(y, x) → x = y)
Definition of transitivity:
R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧ R(y, z) → R(x, z))
Definition of connectivity:
R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨ R(y, x))
Definition of equivalence:
R is an equivalence relation := R is reflexive, symmetric and transitive
3.2. RELATIONS 25
Note: Asymmetric means not symmetric, and is not the same as anti-
symmetric.
Example: The subset relation is reflexive, anti-symmetric (note that the proof
of anti-symmetry uses the axiom of extensionality of page 16) and transitive,
but not connective.
If R is an equivalence relation on a set X, we denote the equivalence class

of x with respect to R as [x]R .
Definition of equivalence class: [x]R := {y ∈ X | R(x, y)}
If R is an equivalence relation on X, the quotient set X/R of X modulo

R is the set of equivalence classes [x]R for all x ∈ X.
Definition of quotient set: X/R := {[x]R | x ∈ X}
We now continue to build on the concept of relations, by categorizing

them based on the properties they have. An important property of relations
is the ability to compare and order elements. Suppose X and Y are sets, and
R is a relation on X.
Definition of (weak) partial ordering: R is a (weak) partial ordering :=

R is reflexive, anti-symmetric and transitive (on X)
Definition of quasi ordering: R is a quasi ordering := R is irreflexive and

transitive
Definition of strict partial ordering: R is a strict partial ordering :=

R is irreflexive, anti-symmetric and transitive
Definition of (total or linear) ordering: R is a (total or linear) ordering

:= R is irreflexive, anti-symmetric, transitive and connective
Definition of well-ordering: R is a well-ordering := R is an ordering on

X and each nonempty subset of X has a least element
Definition of well-foundedness: A set V is well-founded by a relation R

:= S is partially ordered by R and contains no infinite descending chains
A set S contains a set C that is an infinite descending chain iff

C ⊂ S ∧ C has no minimal element.
Theorem: (without proof) Any subset of a well-founded set is also well-

founded.
Now we can speak of a set of which the elements are ordered by a relation
R, we define the well-known concepts of (immediate) successor and prede-
cessor.
Definition of (immediate) predecessor: An element x1 ∈ X is a pre-

decessor of an element x2 ∈ X (with respect to an ordering R on X) :=
R(x1 , x2 ) ∧ ¬R(x2 , x1 ). x1 is an immediate predecessor of x2 if in addition
(¬∃x3 : x3 ∈ X ∧ x3
= x1 ∧ x3
= x2 : R(x1 , x3 ) ∧ R(x3 , x2 ))
Definition of (immediate) successor: An element x2 ∈ X is a suc-

cessor of an element x1 ∈ X (with respect to an ordering R on X) :=
R(x1 , x2 ) ∧ ¬R(x2 , x1 ). x2 is an immediate successor of x1 if in addition
(¬∃x3 : x3 ∈ X ∧ x3
= x1 ∧ x3
= x2 : R(x1 , x3 ) ∧ R(x3 , x2 ))
Note that with these definitions it can be easily proved that if a relation
R on X is an ordering, then each element except the smallest has a unique
immediate predecessor and each element except the largest has a unique
immediate successor. The notions of smallest and largest elements will be
introduced hereafter. In the literature the immediate successor or predeces-
sor is sometimes called just successor or predecessor. Sometimes we also see
that the term ‘direct’ is used in stead of ‘immediate’, or we simply speak of
the ‘next’ or ‘previous’ value.
When R is a partial ordering we often denote it by the symbol , and

when it is a quasi ordering by ≺. Now we can distinguish elements based on
their order. Let X be a set, partially ordered by and let Y be a subset of X.
Definition of minimal element:

x is a minimal element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y
= x : y x)
3.2. RELATIONS 27
Definition of maximum element:

x is a maximum element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y
= x : x y)
Definition of least element:

x is a least (also called smallest or first) element of X :=
x ∈ X ∧ (∀y : y ∈ X : x y)
Definition of maximal element:

x is a maximal (also called greatest, largest, last) element of X :=
x ∈ X ∧ (∀y : y ∈ X : y x)
Definition of lowerbound:
x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x y)
Definition of upperbound:
x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y x)
Definition of infimum:
x is an infimum for Y in X := x is the greatest lowerbound for Y in X
Definition of supremum:
x is a supremum for Y in X := x is the smallest upperbound for Y in X
Example: Let X = {4, 6, 12, 24, 36} and R(x, y) := x is a divisor of y. Then
R is a partial order (but not strict) and also a quasi order, but not a (total)
order. 4 and 6 are minimal elements of X, but X has no least element. 1 is
a lowerbound for X, and 2 is the infimum of X.
The so-called least number principle says that any non-empty subset of
the natural numbers has a least element. This principle can be shown (a
proof can be found in [59, page 7]) to be equivalent to the principles of weak
and strong induction, that will be introduced in section 3.4.
Example: The relation < on the naturals is an example of a total ordering

on N. From the so-called least number principle we can conclude that N is
also well-ordered by <. We prove the latter.
Proof: We know that < is an ordering on N. We show by induction on the

number of elements of A, notation | A |, that (∀A : A ⊆ N ∧ A
= ∅ : A has
a least element).
Suppose N = {0, . . . , n}, n ∈ N. Let A ⊆ N . For | N | = 0 it is trivial that
A is well-ordered. For | N | = n + 1, if A ∩ {0, . . . , n} = ∅, n + 1 is a least
element of A. If A ∩ {0, . . . , n}
= ∅, we can apply the induction principle
to conclude that A ∩ {0, . . . , n} has a least element. The least element of
A ∩ {0, . . . , n} is also a least element of A ∩ {0, . . . , n + 1}.
3.3. FUNCTIONS 29
3.3 Functions
In mathematics, a function maps each element from an input set to one or
more elements of an output set; in other words it is a special kind of relation
that indicates for each pair < x, y > of the input and output set if it belongs
to the function or not. More precisely, f is a function or mapping from X
to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y , no-
tation f (x) = y. We can define this notion in set-theory by using a relation
between X and Y such that for each x ∈ X there is a unique y ∈ Y such
that < x, y > ∈ f .
Definition of function: f is a function from a set X to a set Y , notation

f : X → Y := f ⊆ X × Y ∧ (∀x : x ∈ X : (∃!y : y ∈ Y : < x, y > ∈ f ))
The definitions of domain and range as given in the subsection about

relations can now also be used for functions. We now introduce a notation
for the set of all functions f : X → Y .
Definition of Y X : Y X := {f ∈ P(X × Y ) | f is a function from X to Y }
As we did before for relations and operations, we now define some general
properties for functions.
Definition of injective: f : X → Y is injective or an injection :=

(∀x1 , x2 : x1 , x2 ∈ X : x1
= x2 → f (x1 )
= f (x2 ))
Definition of surjective: f : X → Y is surjective or a surjection :=

(∀y : y ∈ Y : (∃x : x ∈ X : y = f (x))
Definition of bijective:
f : X → Y is bijective or a bijection := f is surjective and f is injective
If f is bijective, f is also called a (one-to-one) correspondence between

X and Y .
Example: We have the following property:

f : X → Y is surjective ↔ Ran(f ) = Y .
Example: f : N → [−2π, 2π], with f (x) = sin(x) is a function and a relation.

g : [−2π, 2π] → N, with g(x) = y iff x = sin(y) is a relation, not a function.
We will now consider two special kinds of functions: the identity function
and the sequence.
Definition of sequence:
s is a sequence of X := s is a function from N to X (i.e. s ∈ X N )
Definition of identity function:

The identity function idX := idX : X → X and (∀x : x ∈ X : idX (x) = x)
We now introduce some operations on functions in set-theory. We can

easily check that these definitions correspond to mathematical operations.
Definition of composition: The composition g◦f of two functions f : A →

B and g : B → C := the function g ◦ f : A → C with g ◦ f (x) = g(f (x)), for
all x ∈ A
Definition of inverse function: The inverse of a bijection f : X → Y :=

the function f −1 : Y → X with (∀y : y ∈ Y : f −1 (y) = x ↔ y = f (x))
Definition of restricted function: The restriction of a function f :

X → Y to X0 , with X0 ⊆ X := the function f X0 : X0 → Y with
(∀x : x ∈ X0 : f X0 (x) = f (x))
Just as in algebra, we can now combine a set and relations on that set
into a structure.
Definition of (relational) structure: X, R0 , . . . , Rp is a (relational)

structure := X is a set and R0 , . . . , Rp are relations on X
The concept of a structure enables us to abstract from the exact set and
relations, and reason about sets of structures instead. There also is a useful
definition for equivalence of structures, called isomorphism.
3.3. FUNCTIONS 31
Let R = X, R0 , . . . , Rp and S = Y, S0 , . . . , Sp be two structures, such

that (∀i : 0 ≤ i ≤ p : the arity of Ri and Si is ni + 1).
Definition of isomorphism: f is an isomorphism between R and S := f

is a bijection from X to Y and (∀i : 0 ≤ i ≤ p : (∀x0 , . . . , xni : x0 , . . . , xni ∈
X : Ri (x0 , . . . , xni ) ↔ Si (f (x0 ), . . . , , f (xni ))))
With the notion of isomorphism, we can now abstract over structures.

When two structures are similar (the sets are of the same size and the rela-
tionships between the elements in one structure are retained between images
of those elements in the other structure), we call them isomorphic.
Definition of isomorphic: Two structures R and S are isomorphic, nota-

tion R S := there exists an isomorphism from R to S
Definition of automorphism:
f is an automorphism of R := f is an isomorphism from R to R
Example: An isomorphism from structure N, < to Neven , < is given by

f : N → Neven , with f (n) = 2n. f is not an isomorphism from N, ⊕ to
N, < , with a ⊕ b := b divides a.
Example: The function g : R+ → R+ with g(x) = log(x) is an isomorphism

between R+ , ∗ and R+ , +, because for all r1 , r2 ∈ R+ , log(r1 ∗ r2 ) =
log(r1 ) + log(r2 ).
Example: An automorphism of A, R0 , . . . , Rp is the identity function idA :

A → A, so idA = {< a, a > | a ∈ A}. Also, the function f (x) = 2x3 is an
automorphism of R, <.
3.4 Induction Methods

There is a tradition of opposition between adherents of induction
and deduction. In my view it would be just as sensible for the two
ends of a worm to quarrel.
- A. Whitehead, quoted in [76]
3.4.1 Induction
Induction is a method of reasoning from a part to a whole, from particu-
lars to generals, or from the individual to the universal. It should not be
confused with the mathematical principle of induction (treated in section
3.4.3). In ordinary induction we examine a certain number of cases and
then generalize. Reasoning by analogy, where a conclusion is made based on
an analogues situation, is also a primitive form of induction (see [23, page 6]).
Example of inductive reasoning: 2

Coffee shop burger no. 1 was greasy . . .
Coffee shop burger no. 2 was greasy . . . . . .
..
.
Coffee shop burger no. 100 was greasy . . .
Therefore, all coffee shop burgers are greasy (or: the next coffee shop burger
will be greasy).
So in induction the conclusion contains information that was not con-

tained in the premisses. This is the source of uncertainty in inductions:
inductions are strengthened as confirming instances pile up, but they can
never bring certainty (unless every possible cause is actually examined, in
which case they become deductions). As said in [49, page 366], the broad
difference between deductive and inductive reasoning is that in deduction
the conclusion asserts less than the premisses, whereas in induction it asserts
more. In chapter 14, section 3 of [49] there is a more detailed treatment of
inductive reasoning, including a distinguishment between determinative and
conceptual induction. In both these kinds of induction, the conclusion goes
beyond the premisses (or the evidence).
2
Example from: Peter Suber, Philosophy department, Earlham College.
3.4. INDUCTION METHODS 33
3.4.2 Deduction
Mathematics, in its widest significance, is the development of all
types of formal, necessary, deductive reasoning.
- A. Whitehead, quoted in [100]
In contrast to induction, deduction is a method of reasoning that is based

on a rigorous proof: a derivation (using fixed rules called a system of logic), of
one statement (the conclusion) from one or more statements (the premisses)
- i.e. a chain of statements, each of which is either a premise or a consequence
of a statement occurring earlier in the proof. In deductive reasoning, we are
not directly concerned with the truth of the conclusion but rather whether
the conclusion does or does not follow from the premisses. If the conclusion
follows from the premisses, we say that our reasoning is valid ; if it does not
we say that our reasoning is invalid .
The Greek found deductive reasoning, not empirical procedures, the method
to establish mathematical facts. This usage is a generalization of what the
Greek philosopher Aristotle called the syllogism (see [49, chapter 1, section
5 and 6)]), but a syllogism is now recognized as merely a special case of a
deduction. Also, the traditional view that deduction proceeds from the gene-
ral to the specific has been abandoned as incorrect by most logicians. Some
experts regard all valid inferences as deductive in form and for this and other
reasons reject the supposed contrast between deduction and induction. The
German mathematician Hilbert greatly contributed to deductive reasoning as
we will see when we introduce his proof theory (also known as the axiomatic
method) in chapter 6. Logic, in mathematical context, can be seen as the
theory of the formal structure of deductive reasoning. The logic of Hilbert’s
metamathematics (see section 6.1) and Russell’s Principia Mathematica (see
section 7.1) are a form of reasoning with deductive certainty, although others
have proposed different formalizations of deductive logic (see [49, page 121]).
Originally based on Aristotle’s logic, the deductive argument has become
more subtle and complex and is now based on modern symbolic logic.
3.4.3 The principle of induction

Informal
The principle of induction, also known as mathematical induction, is an

important process for proving theorems. It was even used by Peano to define
the concept of natural numbers (see section 4.1, axiom 3). ‘Mathematical
induction’ is unfortunately named, for it is unambiguously a form of deduc-
tion. The name was probably inspired by the fact that, just like induction,
it generalizes to a whole set from a smaller sample. But, as we will see,
mathematical induction concludes with deductive certainty.
The informal structure of the proof of a theorem by mathematical induc-

tion is fairly simple:
1) Basis. Prove that the theorem holds for a specific case (which often is
minimal for a given ordering of the elements). This case is also called
base case.
2) Induction step. Prove a rule that says that if the theorem holds for an
arbitrary element, it is true for the next case. This often is a rule of
heredity that tells us that the theory is true for the immediate successor
case of an arbitrary element if it is true for the arbitrary element itself.
The claim that the theorem is true for an arbitrary element is called
the induction hypothesis.
3) Conclusion. Together, 1 and 2 imply that the theorem holds for all
cases starting with the base case. If you didn’t use the minimal case in
step 1, then you have proven only that the theorem holds for that case
and its successors, not for all possible cases.
The induction step can take two forms which correspond to two forms of
mathematical induction. Again we assume there is an ordering of the ele-
ments with +1 the immediate successor relation.
Weak: prove that if the theorem holds for an arbitrary element n, then it
holds for the element n + 1
Strong: prove that if the theorem holds for all elements up to some arbitrary
element n, then it holds for the element n + 1
We will now formally state the principle of induction. This is important,

since many mistakes are being made in applying the principle. It does not
go without saying that if we are to use mathematical induction to prove that
some theorem applies to ‘all possible cases’, then those cases must somehow
be enumerable and in some way linked to the integers. And we have to be
able to speak about the minimal case, the nth case, the successor of a given
case, etc.
Formal
Suppose that we want to prove a property ϕ(s) that holds for all s ∈
S. The induction principle assumes that S is a well-founded set and every
element except for the smallest has an immediate predecessor. This condition
is also known as S is inductive. The structure of an inductive set in fact
resembles that of the naturals, i.e. if we have the axioms (see Peano axioms
in section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N is
inductive. In case the set S is the naturals, we also refer to the principle as
natural induction.
The principle presupposes the following two conditions:
Al S is a set, well-founded by relation R (such that ‘+’ denotes the im-

mediate successor of an element with respect to the relation R) and
with smallest element e
BlEvery element except e has a (unique) immediate predecessor and ϕ

is a property of elements of S
If Aland Blhold, we can use the induction principle.
Definition of the (weak) (mathematical) induction principle:

if
Clϕ(e) (i.e. e has a property ϕ)
Dl (∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i.e. if s ∈ S has property ϕ, then the

(unique) immediate successor of s also has property ϕ)
then the property ϕ holds for every element in S

Step Clis also called the base of a proof by induction, step Dlis also
called the induction step, and ϕ(s) is called the induction hypothesis.
Proof: Suppose S is a well-founded set and every element except the small-
est, denoted e, has an immediate predecessor, and suppose that a property
ϕ is true for e, as well as for the immediate successor s+ ∈ S if it is true for
s ∈ S. We now prove by contradiction that ϕ holds for all s ∈ S. Suppose
that ϕ is not true for all s ∈ S. Let N be the set of elements of S for which
ϕ is not true, i.e. N = {s ∈ S | ¬ϕ(s)}. By the theorem of page 26 we also
know that if S is well-founded, any subset of S is also well-founded, thus N
contains a smallest element n. If n = e, we have a contradiction. If n > e, n
has an immediate predecessor, denoted n−. Since n is the smallest element
for which ϕ doesn’t hold, ϕ must hold for n−. But then by Dl, ϕ must also
hold for the immediate successor of n−, that is n: contradiction. Thus ϕ
must be true for all s ∈ S.
As we mentioned before, this principle can be generalized in several ways.

One way is to prove in step Clthat ϕ holds for a (possibly non-minimal) case
b ∈ S. In step Dlwe then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)).
The conclusion then is that the property ϕ holds for all elements in S that
are ordered larger or equal to b.
We now show (with proof by contradiction) why the additional property Bl

that every element except the smallest must have an immediate predecessor
is necessary for the induction principle.
Consider the natural numbers with the ordering defined as follows:
• if n and m are both even, then n m if n < m
• if n and m are both odd, then n m if n < m
• if n is even and m is odd, we always define n m
We can check that N is well-founded by , but not every element (for

example 1) has an immediate predecessor. We take the property ϕ that every
element is even. The smallest element in the ordering is 0, which is even.
Also, if s has property ϕ then so does the successor of s. That is because
in our ordering, the successor of an even number is always the next even
number, never an odd number, and if s has property ϕ, then s must be even.
Therefore (with only conditions Al, Cland Dlholding) every natural num-
ber is even: contradiction!
There is however a weaker principle, called transfinite induction which -

suitably stated - does apply to every well-ordered set. But first we regard a
stronger principle, that is based on the same assumptions ( Aland Bl) as the
weak induction principle.
Principle of strong (mathematical) induction: The same as for (weak)

induction, but instead of Cland Dlwith

D2 ) (∀x
: x ∈ S : (∀y : y ∈ S : R(y, x) → ϕ(y)) → ϕ(x)) (i.e. for all x ∈ S
we have ϕ(x) if all R-predecessors y of x have property ϕ)
Sometimes this is also informally stated using the infamous three dots as
(∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . . . ∧ ϕ(s)) → ϕ(s+).
Proof: Suppose X, R is a structure such that Al, Bland Elhold. Again
we use proof by contradiction, and assume (∃x : x ∈ X : ¬ϕ(x)). Thus
{x ∈ X | ¬ϕ(x)} is non-empty and has a smallest element e (since X, R
is well-founded). We now have ¬ϕ(e ) ∧ (∀z : z ∈ X : R(z, e ) → ϕ(z)).
According to El(substitute z for y, X for S, and take e for x) we then have
ϕ(e ): contradiction.
Note that the base case is not really left out, since it is implicitly present
in the quantification (take e for x). This form of induction, when applied
to ordinals (ordinals form a well-ordered and hence well-founded set and are
introduced in section 3.8.2) is called transfinite induction.
Principle of transfinite induction3 : The same as for strong induction,

but instead of Aland Blas assumptions, it can be applied to any set S
that is well-ordered by relation a R, and with smallest element e.
3
Sometimes this principle is called the Principle of Complete Induction, for example in
[4], but this is less common.
An example of such a set are the ordinals or cardinals, or even the class
of all ordinals. A proof by transfinite induction typically needs to distinguish
three cases:
1. s is a minimal element
2. s has an immediate predecessor (i.e. the set of elements which are

smaller than s has a largest element)
In this case we can apply normal induction.
3. s has no immediate predecessor (i.e. s is a so-called limit-ordinal, see

also section 3.8.2)
The case for limit ordinals is typically approached by noting that a limit
ordinal b is (by definition) the union of all ordinals a < b and using this
fact to prove ϕ(b) assuming that ϕ(a) holds true for all a < b.
Proof: The proof of the principle of transfinite induction is similar to the

proof of the strong induction principle.
Clearly, all three given principle are equivalent, since we proved them to
be true. These proofs however are based on an underlying set of axioms (the
so-called ZF axioms and the Peano axioms, that will be introduced in section
5.3 and chapter 4 respectively). Without these conditions (to be exact, with-
out Peano’s induction axiom), we cannot directly prove the principles to be
true from the ZF axioms alone4 . In that case we can prove the equivalence
of the principles by showing that they imply each other. As an example,
we now prove that (mathematical) induction is a special case of transfinite
induction, for the set of natural numbers. To prove this it suffices to show
that ( Cland Dl) ↔ El .
4
With only the fundamental axioms of Zermelo-Fraenkel set theory, it is not possible to
prove mathematical induction. An extra axiom is needed, the infamous Axiom of Choice,
or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’s
Lemma’, ‘Well-Ordering principle’ (also known as well-ordering theorem, see page 3.8.2)
and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume one
of them to be true, the others follow as consequences, but none of them can be proven
from the other fundamental axioms in ZF set theory alone. There are also other equivalent
statements that are sometimes used (such as Zermelo’s postulate), and it is a nice exercise
to prove the equivalence of these statements.
Normal induction (IND):
(∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n)))
Transfinite induction (TFIND):
(∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)) → (∀m : m ∈ N : ψ(m)))
We can prove the equivalence of IND and TFIND in two ways: in a con-
structive way or with a proof by contradiction. We give both proofs.
Proof by Contradiction: (from: [17])
It suffices to prove that IND’ ≡ TFIND’, with
IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)))
TFIND’ ≡ (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)))
Proof of TFIND’ → IND’: Assume ϕ is a property. We assume TFIND’,

and instantiate ψ with the property ϕ. We now want to prove IND’. If we
take q = 0, (∀p : p ∈ N : p < 0 → ϕ(p)) is trivially true. Thus we have
ϕ(0). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).
Assume k ∈ N, ϕ(k) ∧ ¬ϕ(k + 1). That means the condition of TFIND’
(∀p : p ∈ N : p < q → ϕ(p)), with q = k + 1 must not be true: ¬(∀p :
p ∈ N : p < k + 1 → ϕ(p)), i.e. (∃p : p ∈ N : p < k + 1 ∧ ¬ ϕ(p)).
Let s ∈ N be the smallest number such that s < k + 1 ∧ ¬ ϕ(s), that is
(∀r : r ∈ N : r < s → ϕ(r)). But then we would have ϕ(s) according to
TFIND’ (namely if we take s for q and r for p), contradiction. Now we have
proved that (∀ϕ :: (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))), and since we already
have proven (∀ϕ :: ϕ(0)), we have IND’.
Proof of IND’ → TFIND’: Assume IND’, instantiate ϕ with ψ. For all

properties ψ we have to prove (∀q : q ∈ N : (∀p : p ∈ N : p < q →
ψ(p)) → ψ(q)). First we prove this for q = 0. If we take q = 0, we have
(p < 0 → ψ(p)) → ψ(0), i.e. ψ(0). This is true by the assumption of IND’.
Now we prove this for q > 0. Suppose we have (∀q : q ∈ N : (∀p : p ∈ N : p <
q → ψ(p)). By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)), and
thus ϕ(q) also holds for all q > 0. Hereby we have proved TFIND’.
Constructive Proof:
Proof of TFIND → IND: Assume TFIND, and let ϕ be a property. We

now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n :
n ∈ N : ϕ(n)). Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). We
want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). TFIND gives us:
(∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let k ∈ N. We now have
that (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k). If k = 0, (∀l : l ∈ N : l < k → ϕ(l))
is trivially true since the range of l is empty. Thus ϕ(k) holds for k = 0.
Assume k > 0, and (∀l : l ∈ N : l < k → ϕ(l)). This means ϕ(k − 1) holds
(since k − 1 ∈ N). But we have assumed that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).
Thus ϕ(k) holds also for k > 0.
Proof of IND → TFIND: Assume ψ is a property. Also assume that

(i): (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let s(k) :=
(∀l : l ∈ N : l < k → ϕ(l)), for all k ∈ N. We prove (∀n : n ∈ N : ϕ(n)) by
first proving that (∀n : n ∈ N : s(n)) by using IND, and subsequently that
(∀n : n ∈ N : s(n) → ϕ(n)). Clearly, s(0) holds trivially since the range of
l is empty in that case. Suppose s(k) holds. Since s(k + 1) ≡ s(k) ∧ ϕ(k),
we can conclude s(k + 1) because ϕ(k) follows from (i) and the definition of
s(k). Now we have s(0) ∧ (∀k : k ∈ N : s(k) → s(k + 1)), and thus (by using
IND) that (∀n : n ∈ N : s(n)). And, by the definition of s, (i) gives us that
(∀n : n ∈ N : ϕ(n)).
Structural Induction
In many cases we do not want to prove properties about the integers or

similar well-ordered sets. In such cases straight induction is not always useful.
However, forms of induction can also be appropriate when trying to prove
properties about structures defined recursively. This generalized induction
principle is known as structural induction. It is useful when objects are built
up from more primitive objects: if we can show the primitive objects have
the desired property, and that the act of building preserves that property,
then we have shown that all objects must have the property. The induc-
tive hypothesis (i.e., the assumption) is to assume that something is true for
‘simpler’ forms of an object and then prove that it holds for ‘more complex’
forms. ‘Complexity’ can be defined in several ways: the most common way
is to say that one object is more complex than another if it includes that
other object as a subpart, but this need not always be the case.
A general treatment of recursively defined structures (formal definition

of structural induction over recursive datatypes) will be presented in a later
version of this report.
Example: We show that mathematical induction is an instance of the general

notion of structural induction over values of recursively defined types, in a
later version of this report.
Example: As an example of the use of mathematical induction we prove the

binomial theorem. The binomial theorem states that for all x, y ∈ R, and
n ∈ N we have
n
n
EQ ≡ (x + y) =
n
xn−j y j
j=0
j
We call the left-hand side of this equality LHS, and the right-hand side
RHS, and abbreviate the equality by EQ. We assume two real numbers x
and y and prove EQ by induction on n.
Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. For
some reason, most textbooks take n = 1 as the basis, in which case LHS is
simply x + y, and RHS is

1 1−0 0 1
x y + x1−1 y 1 = x + y
0 1
Induction case: We assume EQ is true for n = k and have to show that it is
then also true for n = k + 1 :
k+1

k+1 k+1
(x + y) = xk+1−j y j
j=0
j
First, we rewrite the left side of this equation:

LHS = (x + y)k+1 = (x + y)k (x + y) =
(here in fact we are using the induction hypothesis)
k
k
xk−j y j (x + y) =
j=0
j
k k
k k−j+1 j k
x y + xk−j y j+1
j=0
j j=0
j
In rewriting the right side of the equation, we use Pascal’s identity:

n+1 n n
(∀k, n : k, n ∈ N ∧ 0 < k < n : = + )
k k−1 k
We first prove the latter:

n n n! n!
+ = +
k−1 k (k − 1)!(n − k + 1)! k!(n − k)!
n! k n! (n − k + 1) n! (k + (n − k + 1))
= + =
k!(n − k + 1)! k! (n − k + 1)! k! (n − k + 1)!

(n + 1)! n+1
= =
k! (n + 1 − k)! k
Now we rewrite RHS:
k+1

k+1
RHS = xk+1−j y j =
j=0
j
We split out the j = 0 and j = k + 1 terms before applying Pascal’s

identity.
k
k+1 k+1 k+1
x + y + xk+1−j y j =
j=1
j
k
k+1 k+1 k k
x + y + + xk+1−j y j =
j=1
j j − 1
k k
k+1 k+1 k k+1−j j k
x + y + x y + xk+1−j y j
j=1
j j=1
j − 1
We can now bring xk+1 into the first sum (as the j = 0 term), and y k+1
into the second sum (as the j = k + 1 term). This gives
k k+1

k k+1−j j k
RHS = x y + xk+1−j y j
j=0
j j=1
j − 1
and
k k
k k−j+1 j k
LHS = x y + xk−j y j+1
j=0
j j=0
j
The first sums of LHS and RHS are the same, and we can see that the
second sums are also equal, by doing a dummy transformation (let i = j −1):
k+1
k
k k+1−j j k
x y = xk−i y i+1
j=1
j − 1 i=0
i
So LHS = RHS, and we can conclude that EQ holds for all x, y ∈ R and
n ∈ N.
Example: We give an example of a proof about binary trees using structural

induction. First we define a data structure for binary trees. For this example
we will use a definition in the notation of the language Z to describe recur-
sive data structures. The structure of a binary tree is well known and says
that a tree is either a leaf or made up of two subtrees glued together by a node.
TREE ::= leaf | node < TREE × TREE >
An example of such a tree is node(leaf, node(node(leaf, leaf), leaf)). We

now define the size of a tree, by counting both the leaves and the nodes. The
basic idea of the definition is that we define the size of a tree inductively over
the structure, saying how the size of a given tree is calculated from the sizes
of its parts. Again we define the size in the language Z, by first declaring its
type and then saying how it is defined in each of the two cases:
size : TREE → N
∀ t1 , t2 : TREE •
size(leaf) = 1 ∧
size(node(t1 , t2 )) = 1 + size(t1 ) + size(t2 )
Similarly, we make two new definitions about trees:
leaves: TREE → N
nodes: TREE → N
∀ t1 , t2 : TREE •
leaves(leaf) = 1 ∧
leaves(node(t1 ,t2 )) = leaves(t1 ) + leaves(t2 ) ∧
nodes(leaf) = 0 ∧
nodes(node(t1 ,t2 )) = 1 + nodes(t1 ) + nodes(t2 )
We now want to prove the following theorem by structural induction on the

size of the tree t.
Theorem: For all trees t, size(t) = leaves(t) + nodes(t).
Proof: Let t, t , t1 and t2 be of type TREE. We prove the theorem by

induction on the size of t.
Base case: Assume t=leaf. Then size(t) = size(leaf) = 1. Also, leaves(t) +
nodes(t) = leaves(t) + 0 = 1 + 0 = 1.
Induction case: Assume t = node(t1 , t2 ). The induction hypothesis says that
the theorem holds for all t with size(t ) < size(t). Then size(t)= size(node(t1 ,
t2 )) = 1 + size(t1 ) + size(t2 ) = (apply induction hypothesis to t1 and t2 ) 1
+ (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 )).
And leaves(t) + nodes(t) = leaves(node(t1 , t2 )) + nodes(node(t1 , t2 )) =
(leaves(t1 ) + leaves(t2 )) + (1 + nodes(t1 ) + nodes(t2 )) = (commutativity and
associativity of + ) 1 + (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 )).
3.5. REAL NUMBERS 45
3.5 Real numbers

What do we mean when we say ‘continuum’ ? Here is a description Albert
Einstein gave on page 83 of [21]:
The surface of a marble table is spread out in front of me. I can

get from any point on this table to any other point by passing
continuously from one point to a ‘neighboring’ one, and repeating
this process a (large) number of times, or, in other words, by
going from point to point without executing ‘jumps’. I am sure
the reader will appreciate with sufficient clearness what I mean
here by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic).
We express this property of the surface by describing the latter as
a continuum.
People have been using the concept of real numbers for a long time (the
Babylonians for example already calculated with roots long B.C., see [12]).
In order for set theory to cover the fundamental structures of analysis, a
precise and formal basis for the real numbers was needed. Even simple equa-
tions have no solutions if all we knew were rational numbers (for example,
there is no rational number x such that x2 = x ∗ x = 2).
When Cantor developed his set theory, it was well known that each type of
number could be constructed as the limit of a sequence of numbers of another
type. But it became clear that, especially in connection with theorems as-
serting the existence of some limit relations, (see [30, page 182]) the proof
might require irrational numbers to be defined in terms of rational ones, in
order to avoid begging the question of existence involved in the theorem.
Cauchy and Heine tried to define the irrational or real numbers in the second
half of the 19th century. In 1872 Cantor and Dedekind followed with their
precise definition of the real numbers. We first present the three methods
(of Dedekind, Cantor and Cauchy) of defining the reals in terms of rationals
and then show that they are identifiable.
3.5.1 Dedekind’s cuts

As a professor in the Polytechnic School in Zürich I found my-
self for the first time obliged to lecture upon the elements of the
differential calculus and felt more keenly than ever before the lack
of a really scientific foundation for arithmetic.
- Richard Dedekind, in the opening of the paper in which Dedekind’s

cuts were introduced.
Dedekind defined a cut to determine a real number. A cut is a partition

of a sequence into two disjoint nonempty subsequences, all the members of
one of which are less than all the members of the other. Dedekind used the
point at which the sequence is partitioned5 to define a real number.
Definition of a (Dedekind) cut:

Given an ordering < on a set V , a subset C ⊆ V is a cut in V :=
1) C
= ∅ ∧ C
= V
2) (∀a, b : a, b ∈ C : a ∈ C ∧ b < a → b ∈ C)
3) C does not have a greatest element

Example: {x ∈ Q | x2 < 2} is a cut in Q. Notice that we can also define the
same cut as {x ∈ Q | x4 < 4}.
Each real number r can now be defined by a cut C in Q if r is the supre-

mum for C. Each cut then determines a unique real number (see paragraph
3.5.4). We want to identify cuts that define the same real number, such as
for example {x ∈ Q | x2 < 2} and {x ∈ Q | x4 < 4}.
Definition of (Dedekind) cut equivalence: A cut C1 is equivalent to a

cut C2 , notation C1 ∼ C2 := there is a supremum r for C1 and for C2
We can now define RDedekind as the set of all equivalence classes of all cuts
in Q: RDedekind := {C ⊆ Q | C is a cut in Q }/∼.
5
Actually, Dedekind’s original definition did not use a partition but a slightly more
complex division. For details see the link ‘Dedekind cuts’ at http://zax.mine.nu/stage.
√
Example:√ {x ∈ Q | x2 < 2} has 2 as supremum. We can√identify the real
number 2 with the equivalence class of all sets that have 2 as supremum.
3.5.2 Cantor’s chains of segments

In mathematics the art of proposing a question must be held of
higher value than solving it.
- A thesis defended in Cantor’s doctoral examination.
Cantor defined a chain of segments to determine a real number (see also

[17, chapter 12]). This is a sequence of ever decreasing intervals in Q, the
limit of which determines a unique real number.
Definition of chain segments:

< an , bn >Vn∈N is a chain of segments (in V ) :=
1) (∀n : n ∈ N : an ∈ V ∧ bn ∈ V )
2) (∀n : n ∈ N : an ≤ an+1 ≤ bn+1 ≤ bn )
3) (∀n : n ∈ N : bn − an ≤ 2−n )
Example: Consider the following chain of segments in Q:

<< 1, 2 >, < 1.4, 1.5 >, <√1.41, 1.42 >, < 1.414, 1.415 >, . . . >.
Each segment ‘includes’ 2.
Note that < an , bn >Vn∈N (notation < an , bn >V or < an , bn > when it
is clear which set V is meant) is actually a sequence, and in 3) a minimum
bound is put on the speed of convergence. We now want to be able to say
when two chains are equivalent.
Definition of chain equivalence: The chains of segments < an , bn > and

< cn , dn > are equivalent, notation < an , bn > ∼ < cn , dn > :=
(∀k : k ∈ N : bk ≥ ck ∧ dk ≥ ak )
Theorem: ∼ is an equivalence relation on the set of all chains of segments

of Q
Each equivalence class of chains of segments in Q now determines uniquely

a real number r. To be precise, r is determined by < an , bn >∼ if
(∀n : n ∈ N : an < r < bn ). r then is the only real number with this property
(see also paragraph 3.5.4).
We can now define RCantor as the set of all equivalence classes of chains
of segments in Q : RCantor :=< an , bn >Q
n∈N / ∼
3.5.3 Cauchy-sequences
Men pass away, but their deeds abide.
- Louis Cauchy, his last words quoted in [22].
Cauchy defined a Cauchy sequence to determine a real number. His sequence

of numbers defines a real by letting the numbers come closer to the real num-
ber in every step.
Definition of Cauchy Sequence: With a partial order on a set6 V ,

{an }Vn∈N is a Cauchy sequence in V :=
1) (∀n : n ∈ N : an ∈ V )
2) (∀k : k ∈ N : (∃p : p ∈ N : (∀n, m : n, m ∈ N : n, m > p →

| an − am | ≤ 2−k )))
Example: The informally (using ‘. . .’ to informally indicate an infinite con-

tinuation) defined sets {1, 1.4, 1.414, 1.4142, 1.41421, 1.414213, . . .} and
{1, 1.414,√
1.4121, . . .} are both Cauchy sequences. For each n ∈ N, an+1 lays
closer to 2 than an .
We also denote a Cauchy sequence {an }n∈N simply by an . We now want

to be able to say when two Cauchy sequences are equivalent.
6
V is in general an ordered, commutative ring. We will not further discuss this here,
and for the rest of this paragraph take V = Q.
Definition of Cauchy sequence equivalence: The sequences an and bn

are equivalent, notation an ∼ bn := limn→∞ (an ) = limn→∞ (bn )
Note that in the definition of equivalence the hitherto undefined notion

of a limit is used. With the following definition we can formalize the notion
of a limit.
Definition of sequence convergence: A sequence {an }n∈N of elements of

a set V is said to converge to a sequence {bn }n∈N , notation limn→∞ (an ) =
limn→∞ (bn ) := (∀k : k ∈ N : (∃p, q : p, q ∈ N : (∀n, m : n, m ∈ N ∧ n >
p ∧ m > q : | an − bm | < 2−k )))
Note: convergence is usually defined in terms of real numbers, but we can-

not use such definition here because we yet have to define the reals. The num-
ber r is then called the limit of the sequence an , notation limn→∞ (an ) = r,
if (∀k : k ∈ N : (∃p : p ∈ N : (∀n : n ∈ N ∧ n > p :| an − r |< 2−k ))).
A sequence is said to diverge if it does not converge.
Theorem: Any convergent sequence {an }n∈N is bounded and has a unique
limit.
Proof: First we prove (by contradiction) the uniqueness. Suppose the se-
quence has 2 limits, c and c . Take any k ∈ N. Then from the definition of
convergence there is an integer p such that | an −c | < 2−k if n > p. Also, there
is an integer p such that | an − c | < 2−k , if n > p . Adding the two equa-
tions we get (using the triangle inequality: (∀a, b :: | a + b | ≤ | a | + | b | ))
: | c − c | = | (an − c) + (c − an ) | ≤ | an − c | + | an − c | < 2−k ∗ 2.
Hence, | c − c | < 2 ∗ 2−k , for all k ∈ N, if n > p ∧ n > p . This means c = c ,
thus the limit is indeed unique. Now we prove boundedness. The sequence
converges, so we can take, for example, k = l. Then there is a p such that
| aj − c | < 2−k for j > p. We then have, again using the triangle inequality,
that | aj | ≤ | aj − c | + | c | < 2−l + | c |. Then the sequence can be
bounded by M = max.{| a1 |, | a2 |, . . . , | ap |, (1 + | c |)}
Each real number can now be defined by an equivalence class of Cauchy

sequences: r is determined by an ∼ if r = limn→∞ (an ), for each sequence an
from the equivalence class an ∼ .
We can now define RCauchy as the set of all equivalence classes of Cauchy
sequences in Q : RCauchy := < an >Q
n∈N / ∼
3.5.4 Properties of the three definitions

Before these definitions for real numbers were given, we intuitively thought of
the reals as infinite sequences of (decimal) digits. In the rest of this section
we assume that by R we mean this set of reals, i.e. all infinite sequences
of decimal numbers. We can now check whether the three new definitions
indeed are correct ways to identify real numbers:
1) < an , bn > Q is a chain of segments → (∃!c : c ∈ R : (∀n : n ∈ N : an ≤

c ≤ bn ))
2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C))
3) {an }n∈N is a Cauchy sequence → (∃!c : c ∈ R : limn→∞ (an = c))
Then we can check for every newly defined set X of reals that:
a) it contains a countable, densely ordered (i.e. (∀r1 , r2 : r1 , r2 ∈ D : (∃q :

q ∈ Q : r1 < q < r2 ))) set D without endpoint, which is dense in X.
b) every Dedekind cut has a supremum in X.
Every set for which a) and b) hold is isomorphic with R. If a definition

satisfies a) and b) it possesses the properties we intuitively want the real
numbers to have. It can be proven that if these two properties hold we have
defined the reals successfully such that there is a total ordering on the reals,
the reals are densely ordered and the ordering is continuous.
3.6. INFINITE SETS 51
3.6 Infinite sets

Our minds are infinite, and yet even in these circumstances of
finitude we are surrounded by possibilities that are infinite, and
the purpose of life is to grasp as much as we can out of that in-
finitude.
- A.N. Whitehead in [76]
The size of a finite set V , notation | V |, can be defined by the number of

elements that it has. But counting the elements does not end for infinite sets.
Cantor was concerned with the problem of measuring the sizes of infinite sets
(because he was investigating questions about singularities of Fourier series,
see [30, chapter 4]) and proposed a rather nice solution to this problem. He
observed that two finite sets have the same size if the elements of one set
can be paired with the elements of the other set; this method compares sets
without resorting to counting and can be extended to infinite sets.
This is the concept of an equivalence relation between sets (the relation is

also referred to as ‘are of the same cardinality’, ‘equipotent’ or ‘equipollent’
(see [30, page 229])).
Definition of set equivalence: A set V is equivalent to a set W , notation

V ∼ W := there is a bijection f : V → W
It is simple to check that ∼ has the properties of an equivalence relation,

i.e. it is reflexive, symmetric and transitive. But if we consider ∼ to be a
true relation, we need the concept of V , the set of all sets: ∼ ⊆ V × V . But
the existence of V is paradoxical, see section 3.8.
This new method to measure the number of elements of a set is reflected

in the notion of cardinality of a set, and led to the surprising result that
there are many levels of infinity. Before we present a proof of this result,
using Cantor’s famous diagonalization method, we first introduce some more
definitions.
Postulate for Cardinal numbers:

With every set V is associated a well-defined abstract entity V , called the
cardinal number of V , such that V ∼ W ↔ V = W . We can think of V
as denoting the common property of set equivalence (as defined above) of all
sets in the equivalence class of V .
It proved difficult however, to come to an exact definition of cardinality

from this postulate. Cantor regarded cardinals as special abstract entities
of a new kind. In 1884, the German mathematician Frege came with his
own definition of cardinal numbers. He discussed it with the mathematician
Russell and they proposed the idea of defining V as V / ∼, the equivalence
class of V modulo ∼. The postulate for cardinal numbers then follows at
once. Frege also denoted finite cardinal numbers as natural numbers: ∅ = 0,
{∅} = 1, {∅, {∅}} = 2, . . .. This Frege-Russell definition would become stan-
dard, until - as we will later see in section 3.8 - it became known that this
definition could also lead to a paradox.
Cantor used the Hebrew letter aleph to name the different levels of in-
finity. The cardinality of the set of natural numbers is by definition called
aleph-null or aleph-nough, notation ℵ0 . The ‘next levels’ of infinity are called
ℵ1 , ℵ2 , . . .. Since the cardinality of the set of reals was unknown, Cantor de-
fined it as c. If we assume the continuum hypothesis (see section 3.7), that
says there is no level of infinity between the cardinality of N and R, the car-
dinality of the set of reals can also be denoted by aleph-one, notation ℵ1 .
Property of cardinality: Given the cardinality V of a set V , we have
• If V is finite: V = the number of elements of V
• If V is infinite: V = ℵi , when there exists a bijection between V and

the set P i (N)
Sometimes the cardinality of a set V is also denoted by | V | , after the size

of a set V . A more rigorous treatment of cardinal numbers will be given in
section 3.8.1. This new concept enabled Cantor to define more concepts for
the analysis of infinite sets. It also inspired others to analyze the properties
of infinite sets.
No other question has ever moved so profoundly the spirit of man,

no other idea has so fruitfully stimulated his intellect; yet no other
concept stands in greater need of clarification than that of the in-
finite.
- D. Hilbert, quoted in [96]
In the rest of this section we will present some of the results of the research
of infinite sets.
Definition of finite: A set V is finite := (∃n : n ∈ N : V ∼ {x ∈ N | x < n})
Definition of infinite: A set V is infinite := V is not finite
Definition of Dedekind infinite:

A set V is Dedekind infinite := (∃W : W ⊂ V : V ∼ W )
Theorem: V is Dedekind infinite ↔ V is infinite (from [17])

Proof: We show that V is infinite iff N ≤1 V . We prove the two implications
of the theorem separately:
V is Dedekind infinite → V is infinite: V is Dedekind infinite, i.e. there

exists a W ⊂ V such that V ∼ W , i.e. there exists a bijection f : V → W .
Because W is nonempty and W ⊂ V there also exists an a ∈ V such that
a∈ / W . Consider the function g : N → V , defined recursively by g(0) = a
and g(k + 1) = f (g(k)). We now have to show that g is an injection, i.e for
all i, j ∈ N : i
= j → g(i)
= g(j). We use induction on i:
i = 0: if 0
= j then g(0) = a ∈
/ W and g(j) ∈ W , so g(0)
= g(j).
i = k + 1 : assume k + 1
= j, then we can prove g(k + 1)
= g(j) by
induction on j:
j = 0 : g(0) = a ∈
/ W and g(k + 1) ∈ W , so g(k + 1)
= g(0).
j = l + 1: we know k = 1
= j = l + 1, so k
= l. By the induction
hypotheses g(k)
= g(l). Since f is a bijection we also have that
f (g(k))
= f (g(l)), i.e g(k + 1)
= g(l + 1) or g(i)
= g(j).
V is Dedekind infinite ← V is infinite: N ≤1 V , so there exists a bijec-

tion f : N → V . We show that W := V − {f (0)}, clearly a real subset
of V (W ⊂ V ), is equivalent to V (W ∼ V ). The following function g is a
bijection from V to W : g(f (i)) = f (i + 1), g(x) = x if x
= f (i), for all i ∈ N.
Definition of countable:
A set V is countable, also called denumerable := V is finite or V ∼ N
Definition of uncountable: A set V is uncountable := V is not countable
Definition of denumeration: A denumeration of a set V is a bijection

f :N→V
Cantor then proved that N, Z and Q all have the same cardinality and
also called these sets countably infinite.
Theorem: Q is countable
Proof: We give a bijection from N to Q, by listing all elements of Q. Consider
a table with all fractionals ab (a ∈ N, b ∈ N+ , with fractional ab on the ath
row and the bth column. If we list all elements row by row, we would not
obtain a correspondence between N and Q, since the list would never get
to the second row. By listing the elements at the diagonals (south-west to
north-east), starting from the north-west corner, we obtain a correspondence
between N and Q. Because 22 = 11 , etc, we hereby skip an element when it
would cause a repetition. We can also give a bijection from Q to an infinite
subset of N which is equivalent to N: for each fractional ab ∈ Q with a and b
relative prime, let f (< a, b >) := 12 (a + b)(a + b + 1) + n.
An example of an uncountable set is the set of real numbers, R. In 1873

Cantor proved that R is uncountable, using a technique called diagonaliza-
tion (also known as the diagonal method), see [17, page 99].
Theorem: R is uncountable
Proof: Suppose there is a bijection f between N and R. We contradict this
by finding an x in R that is not paired with anything in N. We construct
this X by taking the first fractional digit of x arbitrarily but never 0 or 9 or
the first fractional digit of f (1), the second fractional digit of x also different
from 0, 9, and the second fractional digit of f (2), etc. Continuing this way
down the diagonal of the table of digits, we obtain all digits of x. x is not
f (n) for any n because the nth fractional digit of x differs from the nth frac-
tional digit of f (n).
Note that we avoid the problem of certain numbers such as 2.3999 . . . and
2.4000 . . . being equal by never selecting a 9 or a 0. Similarly, we can use
this diagonalization method to show that N
∼ {0, 1}N .
Theorem: (∀V :: P(V ) ∼ {0, 1}V ). (see [17, page 98])

Proof: We show that there is a bijection K from P(V ) to {0, 1}V . For
W ⊆ V , define K(W ) (also denoted KW ), the characteristic function of W ,
as:
KW (v) = 1 if v ∈ W
KW (v) = 0 if v ∈
/ W.
We now show that K is a bijection from P(V ) to {0, 1}W :

1) f is injective: let W1 , W2 ⊂ V and suppose W1
= W2 , that means there
is an element w ∈ V , such that (w ∈ W1 ∧w ∈/ W2 )∨(w ∈ / W1 ∧w ∈ W2 ).
Then we have that (KW1 (w) = 1 ∧ KW2 (w) = 0) ∨ (KW1 (w) = 0 ∧
KW2 (w) = 1), and thus (∃w : w ∈ V : KW1 (v)
= KW2 (v)), i.e. KW1
=
KW2 .
2) f (w) is surjective: suppose g ∈ {0, 1}V . Let Wg = {v ∈ V | g(v) = 1}.

Then (∀v : v ∈ V : KWg (v) = 1 ↔ g(v) = 1), thus (∀v : v ∈ V :
KWg (v) = g(v)), and g = KWg .
We can define an ordering relation ≤1 on the cardinalities of sets. We
say that V ≤1 W if there is an injection from V to W but not vice versa.
Then V <1 W of course means that V ≤1 W holds but not V ∼ W . This
relation on the set of cardinals only depends on the cardinals themselves and
not on the choice of the particular sets V and W . The relation ≤1 is reflexive
and transitive. Cantor also conjectured that ≤1 is a partial order. This was
later proven independently by the two mathematicians F. Bernstein and E.
Schröder (see [59, page39]).
We give two theorems that are based on the relation <1 :
Theorem: (without proof) (∀ V : V is a non-empty set: V <1 P(V ))

Theorem: V is Dedekind infinite ↔ N ≤1 V

Proof: This theorem follows directly from the theorem on page 53 and the
definition of infinite.
Although we have seen that N is countable but R is not, we might still

think that there is some smaller interval of the reals that can be paired to
the naturals.
Theorem: N
∼ [0, 1]
Proof of Poincaré (see [17]) We show there is no bijection f : N → [0, 1],
in particular (∀f : (f : N → [0, 1]) : f is not surjective). We do this
by constructing for every function f : N → [0, 1] a y ∈ [0, 1] such that
(∀n : n ∈ N : f (n)
= y). We construct this y by means of a chain of
segments (see paragraph 3.5.2).
Let f : N → [0, 1]. Let Sn be an infinite chain of segments such that
1) (∀i : i ∈ N : f (i) ∈
/ Si )
2) (∀i : i ∈ N : Si+1 ⊆ Si )
3) (∀i : i ∈ N : | Si | = 3−i−1 ),
with | Si | being the length of segment Si .
We can construct such a chain of segments, for if we divide a segment

Sn = [pq , qn ] in three equal parts (i.e. each part has length 3−n−1 ), at least
one of these parts does not contain f (n + 1). We take this part for Sn+1 .
The constructed chain of segments determines (see paragraph 3.5.2) a real
number y, with (∀n : n ∈ N : y ∈ Sn ), and thus certainly y ∈ [0, 1]. We also
have that (∀n : n ∈ N : f (n) ∈ / Sn ∧ y ∈ Sn ), i.e. so (∀n : n ∈ N : y
= f (n)).
The following theorem gives a way to prove the equivalence of sets:
Theorem of Cantor-Bernstein: V ≤1 W ∧ W ≤1 V → V ∼ W
Proof: Assume V ≤1 W and W ≤1 V . Then there are injections f : V → W
and g : W → V . We know that Dom(g) = W , so to prove g is surjective
we have to prove Ran(g) ∼ W . Since Ran(g) ⊆ V and g ◦ f is an injec-
tion from V to Ran(g), we have V ≤1 Ran(g). And since for all W and V ,
W ⊆ V ∧ V ≤1 W → V ∼ W (see the lemma below), we have Ran(g) ∼ V .
Lemma: W ⊆ V ∧ V ≤1 W → V ∼ W
Proof: Suppose W ⊆ V and V ≤1 W . There is an injection h : V → W . Let
A0 := V − W , and (∀n : n ∈ N : An+1 := h(An )). We now give the desired
bijection k : V → W .

• k(a) := a if a ∈
/ n An

• k(a) := h(a) if a ∈ n An
We show that k is a bijection:
• k is injective:
a
= b, then
Suppose k(a)
= k(b) by using a case analysis
a ∈
/ n An ∧b ∈ / n An, a ∈
/ n An ∧ b ∈ n An , a ∈ n An ∧ b ∈ /
A
n n , a ∈ A
n n ∧b ∈ A
n n . For all cases, it follows that k(a)
= k(b)
by the definition of k and the injectivity of h.
• k is surjective: Suppose w ∈ W , thus w ∈/ A0 . Again we use case

analysis:

– if w ∈/ n An then w = k(w).

– if w ∈ n An , assume w ∈ Ap . Since w ∈
/ A0 , p ≥ 1. Thus there is
a w ∈ Ap−1 such that w = k(w ).
Example: We prove that (a, b) ∼ [0, 1] for all a, b ∈ R by using the theorem
of Cantor-Bernstein. We first prove that (0, 1) ∼ [0, 1] and consequently
that (0, 1) ∼ (a, b). Then, by the transitivity of ∼ we can conclude that
(a, b) ∼ [0, 1].
Proof of (0, 1) ∼ [0, 1]: The identity function id(0,1) : (0, 1) → [0, 1]
is an injection from (0, 1) to [0, 1], so (0, 1) ≤1 [0, 1]. The function
f (x) = 13 (x + 1) is an injection from [0, 1] to (0, 1), so [0, 1] ≤1 (0, 1).
By the theorem of Cantor-Bernstein we now know that (0, 1) ∼ [0, 1].
Proof of (0, 1) ∼ (a, b): The function f (x) = (b − a)x + a is a bijection

from (0, 1) to (a, b).
Using the Cantor-Bernstein theorem we can also prove that

(a, b) ∼ (0, 1) ∼ R ∼ Rn ∼ {0, 1}R ∼ P(N) ∼ NN , for all n ∈ N, n ≥ 1.
Theorem: V is infinite → N ≤1 V
Proof: V is infinite and thus not empty. We take one element x0 ∈ V . Next,
we take an element x1 ∈ V − {x0 }. We can repeat this infinitely (i.e. for all
n we can select an x ∈ V − {x0 , . . . , xn }), if we assume that it is possible
to always select an element from any non-empty set (see the axiom of choice
below). In this way we get a countable subset of V , namely {x0 , x1 , x2 , . . .}.
The only assumption we have made here is the so-called axiom of choice.
Axiom of choice (AC): Given any set W of non-empty sets V , there is a

function f which assigns to each member V of W an element f (V ) of V .
This definition was proposed first in an article by Zermelo in 1908 (trans-

lated in [93, pages 199-215]). Such a function f is called a choice function
for W . The axiom can be restricted by limiting to those families W of a par-
ticular cardinality. Since for any finite W the axiom is provable, the weakest
non-trivial case occurs when W is denumerable (see page 54 for the definition
of denumerable). This case is known as the Denumerable axiom. Zermelo
regarded the AC as already implicitly used by mathematicians. In response
some people asked when this assumption developed from mathematics, when
it is implicitly used, and when exactly it can or cannot be avoided. Zermelo
attempted to prove AC, but the controversy over his proof of 1904 (see [63,
page 310]) led Zermelo to axiomatize set theory (see section 5.3.1). We can
add AC to set theory based on the axioms of Zermelo and Fraenkel (ZF, see
section 5.3), in which case it is termed ZFC (ZF supplemented by the Axiom
of Choice). For more details on the role of the AC, we refer to section 5.3
and [63]. See http://zax.mine.nu/stage and click on ‘links’ for some quotes
about the AC.
An instance of the following theorem (without proof) of the British ma-

thematician F.P. Ramsey is often used in graph theory. The notation V n in
this theorem is defined as the set of all subsets of V with n elements, i.e.
V n := {X ⊆ V | X has n elements}.
Theorem of Ramsey: If V is a denumerable set and f : V n → {0, 1, . . . , m−

1} with n, m ∈ N and n, m ≥ 1 then (∃W : W ≤1 V : W is denumerable and
f is constant on W n ).
Theorem: R2 ∼ R ∼ (0, 1)
Proof: We can say that R ∼ (0, 1) if there is a bijection between (0, 1)
and R. Indeed, there exists a bijection f : (0, 1) → R, defined as f (x) =
tan( π2 (2x−1)). Thus: R ∼ (0, 1). If we consider an element of R2 , that is two
real numbers between 0 and 1, then we can map these numbers to an element
r ∈ R by interchangeably taking the next digit of each of the two numbers.
For example, we map (0.76584 . . . , 0.13275, . . .) uniquely to (0, 71635 . . .).
Thus: R2 ∼ R. Since ∼ is transitive, we know that R2 ∼ R ∼ (0, 1).
Theorem: P(N) ∼ (0, 1)

Proof: First we show that P(N) ≤1 R. Suppose V ∈ P(N), map V to
the decimal 0.a1 a2 . . ., with ai = 1 if i ∈ V and ai = 0 otherwise. This
injection proves that P(N) ≤1 R. Now we give an injection from (0, 1) to
P(N): assume r ∈ (0, 1), i.e. r = 0.a1 a2 . . . with 0 ≤ ai ≤ 9. We want
to identify numbers such as 0.3999 . . . and 0.4000 . . .. Therefore we assume
there is not an i ∈ N such that for all n > i, n ∈ N, an = 9. Then we
map r to the set {1a1 , 1a1 a2 , . . .} of natural numbers. Clearly, this map-
ping is well-defined. For example, r = 0.17803 . . . is mapped to the set
{11, 117, 1178, 11780, 117803, . . .}. Thus (0, 1) ≤ P(N), hence P(N) ∼ (0, 1).
Corollary: P(N) ∼ R
Proof: This directly follows from P(N) ∼ (0, 1) and (0, 1) ∼ R, and the
transitivity of ∼.
3.7 The Continuum Hypothesis

We still think that the study of the size of the continuum should
be our guiding light for further research in set theory.
- Judah Haim in [33]
After showing that the real numbers cannot be put into one-to-one corre-
spondence with the natural numbers (see section 3.5), Cantor hypothesized
in 1877 that each infinite subset of R is either denumerable or equivalent
to the continuum. This hypothesis was first published in 1878 in [13] and
became later known as:
The Continuum Hypothesis (CH): (N ≤1 A ≤1 R) → (A ∼ N ∨ A ∼ R)
This hypothesis (as given in [17, page 128]) is also known in many other
forms, of which we will mention and explain the most important. We can
immediately see that the following version of CH is equivalent to the given
definition: ‘any set of real numbers is either finite, countably infinite or has
the same cardinality as the entire set of reals’. This means that ‘the num-
ber of real numbers is the next level of infinity above the number of natural
numbers’ (see also [30, page 197]).
As we saw in section 3.6, Cantor defined the cardinality of the natural

numbers to be ℵ0 , and the next levels of infinity to be ℵ1 , ℵ2 , ℵ3 , etc. He also
named the cardinality of the reals c, for continuum. Cantor’s original for-
mulation of CH was: (B) c = ℵ1 . Since Cantor also proved that P(N) ∼ R
(see page 59), we can also state CH as: (C) P(N) ∼ ℵ1 . The cardinality of
the power set of any set X is equal to the cardinality of {0, 1}X (see page
55), often denoted as 2X , so another formulation7 of CH is: (D) 2ℵ0 = ℵ1
(see [31]). These formulations, although (B) leads us to think about sizes
of reals, (C) about subsets and (D) about cardinal exponentiations, are all
equivalent in ZF C. We will not go into details of less precise or more de-
pendant formulations such as ‘what is the cardinality of the set of points on
a geometrical line?’.
7
Actually in this formulation we have identified the cardinalities ℵ0 and ℵ1 with the
sets that have these cardinalities.
3.7. THE CONTINUUM HYPOTHESIS 61
Some of the theory that is needed in the remaining part of this section, for
the generalized continuum hypothesis, will be introduced in later chapters.
If you are not familiar with the notations that are used, you might want to
skip the remaining part of this section and get back to it later.
In 1908 the German mathematician Felix Haussdorf proposed the follo-

wing generalization of CH (that is also called aleph-hypothesis):
The Generalized Continuum Hypothesis (GCH):

(∀r : r is an ordinal : 2ℵr = ℵr+1 )
For a definition and the notation of ordinal numbers, we refer to section

3.8.1. Obviously, (see section 5.3) we have that ZF + GCH ! CH. Note
that ZF + GCH ! AC (so we don’t need ZF C once we have GCH).
Cantor and many other great mathematicians spent years trying to prove
CH or its negation (Cantor tried to prove his hypothesis by using a decom-
pensation theorem; for details see [31, page 117]), but did not succeed. This
problem was so important that Hilbert (see section 6.2) put it first in his list
of 23 problems.
In 1938 significant progress was made when the mathematician Gödel

proved (in his article ‘What is Cantor’s continuum problem?’) that CH is
consistent with ZF C (see section 5.3.2) by constructing a model of ZF C +
CH. Since at the same period, Gödel proved his famous incompleteness the-
orem (see chapter 8), people suspected that CH was one of the statements
(of ZF C) that can neither be proved nor disproved. Mathematicians sus-
pected that CH was undecidable in ZF C but it took until 1963 until this
was proved by Paul Cohen in [15].
To do that he used a new technique called forcing. Forcing is a combi-

natorial technique for proving statements consistent with the axioms of set
theory. Cohen used it in order to prove that the negation of AC and the
negation of CH are consistent with the axioms of set theory (AC and CH
were already known to be consistent). Essentially it consists of a method
of performing the following algorithm: start with a model of set theory M.
Construct an object X not in M with certain properties. Consider the smal-
lest model M with X an element of M and M a subset of M (this is done
in a way such that the construction of M is implicit in the construction of
X). For more details on forcing, see [51] and [81].
Thus Cohen constructed a model of ZF C + ¬CH and this, along with

Gödel’s model of ZF C + CH, showed that CH is undecidable in ZF C. So
this means that either CH or ¬CH could be added as an axiom of ZF C.
But since neither of these axioms seems axiomatic or ‘self-evident’ they have,
unlike AC, not been adopted as axioms of set theory. Mathematicians either
accept this incompleteness in set theory or try to find more intuitive axioms
that will help decide it. In other words, the question remains what intuitive
axiom of set theory we need to make it more complete, and whether, with
some axiom system for set theory, the continuum hypothesis is true.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63
3.8 Cardinal and Ordinal numbers and Para-

doxes
Every transfinite consistent multiplicity, that is, every transfinite
set, must have a definite aleph as its cardinal number.
- Georg Cantor
3.8.1 Cardinal numbers and Cantor’s Paradox

In section 3.6 we already encountered cardinal numbers and the notion of
set equivalence. After defining the equivalence of sets (see page 51), Cantor
realized that all sets that are equivalent to a given set V have a common
property. He identified this property with the cardinal number V of a set V ,
a property that abstracts from the nature and order of the elements of a set.
Example: Consider the following sets: A = {1, 2, 3}, B = {3, 2, 1}, C =

{{4}, 7, {a, b}}, D = {1, {4}}. We can say that A ∼ B ∼ C, or (equiva-
lently) A = B = C. We also have A
∼ D, or A
= D. Note that in this
example the equality ‘=’ between cardinal numbers is a new type of equality
that is defined as A = B ↔ A ∼ B.
We can see that cardinality abstracts from the order and nature of the
elements, and for finite sets the cardinal number can be identified with the
ordinary ‘number of elements’. Therefore we identify the cardinal number of
a finite set of n elements with the natural number n. We denote the smallest
infinite set (or transfinite) cardinal number by ℵ0 . As we have already seen
on page 52, this is the cardinal number of N or any denumerable infinite set.
Cantor defined the ‘next’ levels of infinity by ℵ1 , ℵ2 , . . ..
The next question was how to pass from the abstract notion of cardinal
numbers to real cardinal numbers, i.e. one wanted to regard cardinal numbers
as objects of the mathematical system. It turned out to be quite a problem
to define the cardinal V of a set V as an object of set theory. In naive set
theory, as well as in Quine’s ‘New Foundations’ (see section 7.3), the defini-
tion of the cardinal V of V poses no problem: V can be defined as the set
of all sets equivalent to V . But this definition (first given by Frege, see page
3.6) of cardinal numbers as given in section 3.6 can lead to a paradox that
was first found by Cantor.
Cantor’s paradox: The set of all sets is its own power set. Therefore, the
cardinality of the set of all sets must be bigger than itself.
In axiomatic set theory however (e.g. in ZF, see section 5.3), without the
unrestricted comprehension axiom, there is no set which contains all sets
equivalent to V . With this paradox the need arose to find a new definition of
cardinals in a context without the unrestricted comprehension axiom, such
that traditional paradoxes could no longer be derived.
Several new definitions of cardinal numbers were then proposed, based

on ordinal numbers (for which we refer to the next section8 ). The following
definition that comes from the mathematician von Neumann is now the stan-
dard definition for cardinal numbers.
Definition of Cardinal number (or initial number):

A cardinal number α := an ordinal number α with property (∀γ :: α ∼ γ →
α ≤ γ)
For each set V we can prove (see [17, section 2.10]) that there exists
exactly one cardinal number α satisfying V ∼ α (proof uses AC). We call
this unique α the cardinality or cardinal number of the set V , and is also
denoted by V .
In other words, with the axiom of choice we can develop the theory of
ordinals in the von Neumann way and define V to be the least ordinal α equiv-
alent to V . The existence of such an α is guaranteed by the well-ordering
theorem. If we have the axiom of foundation among our axioms, even if the
axiom of choice is absent we can define V as the set of all sets W of least
rank among those equivalent with V (see [1]). In the absence of the axioms
of choice and foundation the operation V is undefinable (see [1]).
For more information on the definition and calculus of cardinal numbers,

we refer to [59, chapter 6], [25] and [34].
8
The rest of this section depends on concepts that are defined in later chapters.
3.8.2 Ordinal numbers and Burali-Forti’s Paradox

We already introduced Cantor’s concept of cardinal number in section 3.6,
and saw in the previous paragraph that it abstracts of the order and nature
of the elements of a set. Cantor also defined a property of sets, the ordinal
number , that only abstracts from the nature of the elements of a set, but
retains the order in which they are given.
Here we consider sets with a total ordering (see page 25). Recall that in
addition for a well-ordered set, each non-empty subset also has a first mem-
ber in the given ordering (see also page 3.2). In the case of ordered sets, the
concept of equivalence is now replaced by the sharper concept of similarity.
We consider two ordered sets V and W similar , notation V W , if there is
a bijection between V and W that retains all order relations. Note that we
have already seen this relation with the concept of isomorphism (‘is isomor-
phic to’, see page 31), and note that is an equivalence relation. Instead
of saying two sets are similar, we also can say they are of the same order type.
Definition of an Order Type: An equivalence class under the (isomor-

phism) relation
The equivalence class to which an ordered set V belongs is called the

order type of V . All well-ordered sets that are as such similar to a given set
V have a common property. Cantor identified this property with the ordinal
number V of a well-ordered set V , a property that only abstracts from the
nature of the elements of a set. And just as for cardinals (see section 3.8.1)
the question was posed how to define ordinal numbers as part of set theory.
In 1883 Cantor defined in [13] an ordinal number as the order type of a well-
ordered set.
Definition of Ordinal Number (Cantor): A well-ordered set V has or-

dinal number o := o is the order type of V
If a set is finite and simply ordered, it is well-ordered and it has an ordinal

number. The ordinal number of that set is the same, regardless of the order
of the elements. For each finite and simply ordered set, we can therefore
identify the (finite) cardinal number with the ordinal number.
Example: 0 = ∅; 1 = {0}; 2 = {0, 1}; 3 = {0, 1, 2} are ordinal numbers.
The smallest infinite ordinal number is called ω. This is the ordinal num-
ber of the sequence {0, 1, 2, 3, . . .}, which can be seen as N or as the sequence
of finite cardinal numbers in their ‘natural’ order. We introduce some other
transfinite ordinals by example (from [10, page 66]).
Example:
If we call the set ∅ as ‘0’, the next set as ‘1’, etc., then consider the union
of all the sets {0, 1, 2, . . . }. This is another ordinal called ω and is the
first non-finite ordinal. It has a successor: ω ∪ ω, called ω + 1. More
ordinals can be obtained by continuing this succession, and taking the
union of all these ordinals yields an ordinal we call ω ∗2, etc. The natural
numbers in reverse order are denoted ∗ω.
V1 = {2, 3, 4, . . . , 1} ; V2 = {3, 4, 5, . . . , 1, 2}
V3 = {1, 3, 5, . . . , 2, 4, 6, . . .} ; V4 = {. . . , 3, 2, 1}
V5 = {1, 3, 5, . . . , 6, 4, 2} ; V6 = {1, 11, 21, . . . , 2, 12, 22, . . .}
N = ω ; V1 = ω + 1 ; V2 = ω + 2 ; V3 = ω + ω = ω ∗ 2
V4 = ∗ω ; V5 = ω +∗ω ; V6 = ω ∗ 10
For ordinal numbers n of N and m of M we say that n < m if the well-

ordered set N is similar to a real subset of M .
Unfortunately, a similar situation as for cardinal numbers, was found

for ordinal numbers. In 1897 it was found by the Italian assistant of the
mathematician Peano, Burali-Forti, that this definition can give rise to a
paradox (see [18, page 259]).
The Burali-Forti Paradox: The set of all ordinal numbers, taken in their
natural order, form a well-ordered series, and therefore also has an ordinal
number Ω. But the ordinal number of any subset of the set of all ordinals
exceeds every number of that subset, and therefore Ω exceeds any ordinal
number whatsoever.
This led to new proposals for definitions of ordinal numbers. Hence we

hereunder present another definition, given by John von Neumann in [61].
In 1923 he pointed out that among all well-ordered sets having a Cantorian
ordinal as their order type, there is a particular one with some very special
properties. Von Neumann defined this particular set as the ordinal of that
order type.
Definition of ordinal number: A set α is an ordinal number :=
1) α is a well-ordered set with the binary relation ∈ as its ordering
2) (∀β :: β ∈ α ↔ β ⊂ α)
With this definition of ordinal numbers, the Burali-Forti paradox can

no longer be applied, since the set of all ordinals is well-ordered by and
2) also holds (a proof is given in [59, section 4.2]). According to this def-
inition, the empty set is an ordinal number. This ordinal number is also
denoted by 0. Similarly we also denote the ordinal numbers {0} by 1, {0, 1}
by 2, {0, 1, 2} by 3, etc. Otherwise said: 0 = ∅, 1 = {∅}, 2 = {∅, {∅}}, . . ..
These ordinal numbers, which are finite sets, are called finite ordinal num-
bers. The finite ordinal numbers are identified with the natural numbers.
The set ω = {0, 1, 2, . . .} of all natural numbers is also an ordinal number.
An ordinal number that is an infinite set, like ω, is called a transfinite ordi-
nal number . For every well-ordered set V , there exists exactly one ordinal
number isomorphic to V .
Definition of ordinal number of a well-ordered set V :

The ordinal number of a well-ordered set V := the ordinal number isomorphic
to V
A detailed treatment of ordinal calculus that is based on this definition

of of ordinal numbers, is outside the scope of this report. In the remainder
of this section we will only define the most common concepts.
As we saw in 3.2 we also write α ∈ β (we denote ordinals by lower-case

Greek letters) as α < β, which defines an ordering on the ordinal numbers.
The least ordinal number is of course 0, and the ordering of the finite ordi-
nal numbers coincides with the usual ordering of the natural numbers. The
least transfinite ordinal is ω (see also 5.3.2). The ordering ≤, defined by
α ≤ β := α < β ∨ α = β, is a linear ordering and a well-ordering of the
ordinal numbers. Therefore we can apply transfinite induction (see page 37)
on ordinal numbers.
For any ordinal number α, the set α = {γ | γ ≤ α} (called a seg-

ment of α) also is an ordinal number, and α is the unique predecessor of
α. A transfinite ordinal without a predecessor is called a limit ordinal num-
ber , and all the other ordinal numbers are called isolated ordinal numbers.
The first limit ordinal number is ω. For any set V of ordinal numbers,
{γ | (∃η : η ∈ V : η ≤ γ)} is an ordinal number, the supremum of V .
A full treatment of the theory of ordinal numbers is omitted here. Ri-

gorous study has produced a complete calculus of ordinal numbers and pro-
duced significant results. We only mention here the so-called well-ordering
theorem, which Cantor had accepted as true (see [18, page 257]) but that
was first proved rigorously by Zermelo in 1904.
Well-Ordering Theorem: Every set can be well-ordered.
This means that ordinals give us a way of ‘counting’ any set, even if it is
not finite. The particular significance of the well-ordering theorem lies in the
possibility that we can apply the principal of mathematical induction (which
is well known for denumerable sets, see section 3.4.3) to any arbitrary well-
ordered set. Ordinal numbers form the basis of transfinite induction which
is a generalization of the principle of induction.
We now have the following properties (given without proof):
• Two finite and ordered sets have the same order type if and only if they
have the same cardinal number
• Cantor’s theorem : the cardinality of any set is lower than the cardi-
nality of the set of all its subsets (i.e. there is no highest aleph)
• If two sets have the same ordinal number, they have the same cardinal
number, but not necessarily vice versa
For more information and theory on cardinal numbers, ordinal calculus

and set theory we refer to two classical books on set-theory: [25] and [34].
The first one gives a good introduction to set theory and presupposes little
mathematical knowledge, the latter is more suitable for readers with experi-
ence on set theory.
Chapter 4
Peano and Frege
4.1 Peano’s arithmetic

Questions that pertain to the foundations of mathematics, al-
though treated by many in recent times, still lack a satisfactory
solution. The difficulty has its main source in the ambiguity of
language.
- Peano in the opening of the paper ‘Arithmetices Principia’, novo

methodo exposita in which he introduces axioms for the integers
The Italian mathematician Giuseppe Peano (1858-1932) spent most of

his career successively in the infinitesimal calculus, in foundations of mathe-
matics and in linguistic studies. After his work on calculus (see Peano’s first
publication [65]) and geometry (see [66] [67]), Peano gained particular inter-
est in the field of number theory, also known as arithmetic. Like Dedekind
(see quote on page 46), Peano became aware of the lack of rigour in mathe-
matics by his experience in teaching infinitesimal calculus.
What is number theory? The field of mathematics consisting of the study

of the properties of the natural numbers
Since then, Peano strived for rigor, for an abstract mathematics. He came
to the conclusion that mathematics must be constructed, independently of
intuition or common sense, in a way that absolutely guarantees the validity
71
72 CHAPTER 4. PEANO AND FREGE
of its theorems.
In order to satisfy this requirement he devoted himself to the transforma-

tion of mathematics into a self-contained system, and rewrote mathematics in
symbolic form as an axiomatic system (see section 6.1), based exclusively on
postulated primitive notions and primitive propositions. To discard intuition,
he first renounced ordinary language (because it is often not sufficient and
imprecise) and desired a new mathematical symbolism, consisting entirely
of neutral symbols. Second, he formalized the logic of the mathematical ar-
gument to replace intuitive inference by application of a limited number of
stated logical rules.
So Peano formalized both the language of mathematics and the logic

of the mathematical argument, and thereto first developed parts of sym-
bolic logic and first formalized propositional and predicate calculus. This
development was rudimentary and would later be worked out in full detail
by the mathematicians Russell and Whitehead in ‘Principia Mathematica’
(1910, see section 7.1). He introduced letters to denote propositions and
propositional functions (Peano’s logic notation) and the symbol ∈ for the
membership relation of a set.
The work of formalization of mathematics was published in the journal

‘Rivista di Mathematica’ (this journal was previously founded by himself)
and ‘Formulario Mathematico’, a series of 5 books that is also known as
‘Formulaire de Mathématique’1 . In 1899 he axiomatized the arithmetic of
cardinal numbers, to be published in the third volume of ‘Formulario Math-
ematico’ in 1901. Peano based the foundations of arithmetic on 5 axioms
(see [31, page 227]), that are formulated with the help of three (undefined)
terms, the acquaintance with the latter being assumed:
a) N (the set of natural numbers)
b) 0 (the particular natural number zero)
c) a+ (the immediate successor of the natural number a)

1
The original ‘Formulaire de Mathématique’ was called ‘Formulario Mathematico’ when
the first final version appeared in 1908, because Peano at that time consistently used
Interlingua, his simplificated dialect of Latin, for all his mathematical publications.
4.1. PEANO’S ARITHMETIC 73
Definition of the Peano axioms for the natural numbers:
1) 0 ∈ N
(zero is a natural number)
2) a ∈ N → a+ ∈ N
(the immediate successor of any number is a number)
3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) → N ⊂ S
(if a set S contains zero and if it contains any number x it also contains
the immediate successor x+ of that number, then S includes the whole
of N)
4) a, b ∈ N ∧ a+ = b+ → a = b
(no two different numbers have the same immediate successor)
5) a ∈ N → a+
= 0
(zero is not the immediate successor of a number)
Axiom three has the function to formalize the principle known as mathe-
matical induction. We can show that in ZF (see section 5.3) we can derive
the five axioms of Peano. For more information on the Peano axioms, I refer
to [31, chapter 5], [49, page 146-147] and [64, appendix A].
After defining the natural numbers, Peano used a recursive definition to

define the arithmetical sum, product and other operators, and he derived
much of the elementary number theory.
Example: Peano defined the sum a + b by recursion with respect to b :

a + 0 = a, a + (b+) = (a + b)+. Similarly we can define the product
a ∗ b : a ∗ 0 = 0, a ∗ (b+) = (a ∗ b) + a.
Peano then showed how rationals and reals can be formally obtained from
naturals, and further considered elementary analysis and geometry. In later
years, Peano turned away from the foundations of mathematics and devoted
most of his time on his new international auxiliary language Interlingua. He
invented this language (see [49, page 148-150]) in an attempt to reduce the
grammatical structure of languages and create a universal language. His
mathematical work were to have a profound influence on the thought of
mathematics, but his language Interlingua received little response.
4.2 Frege’s work

As I think about acts of integrity and grace, I realize that there
is nothing in my knowledge to compare with Frege’s dedication to
truth. His entire life was on the verge of completion, much of his
work had been ignored to the benefit of men infinitely less capa-
ble, his second volume was about to be published, and upon finding
that his fundamental assumption was in error, he responded with
intellectual pleasure clearly submerging any feelings of disappoint-
ment. It was almost superhuman and a telling indication of that
of which men are capable if their dedication is to creative work and
knowledge instead of cruder efforts to dominate and be known.
- B. Russell about Frege, in [93, page 127]
The German mathematician and philosopher Gottlob Frege (1848-1925)

was one of the founders of modern symbolic logic putting forward the (lo-
gistic) view that mathematics is reducible to logic. He has written many
important papers on philosophy. Frege once said ‘every good mathematician
is at least half a philosopher, and every good philosopher is at least half a
mathematician’. Famous is his ontological argument for the existence of god,
but we will not discuss his philosophical writings here. We will mention his
three most important works on the foundations of mathematics: Begriffs-
schrift, Grundlagen der Arithmetik and Grundgesetze der Arithmetik.
Begriffsschrift
Just as Peano, the German mathematician Gottlob Frege invented a log-

ical symbolism to which he gave the name ‘Begriffsschrift’ (in English known
as ‘Concept script’). We will not treat the symbolism that was used in Be-
griffsschrift here, in full detail (it can be found in [49, page 175-182] and in
[31, page 177-199]), but give a few examples of his new logic and describe
the rest of his work in general terms.
Frege rejected the subject/predicate regimentation on which Aristotelian
logic depends, and recognized (not as the first) that the patterns of Aris-
totle cannot always be used to evaluate inferences correctly.
4.2. FREGE’S WORK 75
Example: Certain obvious inferences, such as:
If Joe doesn’t wear a kilt, than Joe is not Scottish.
Joe doesn’t wear a kilt.
Therefore, Joe is not Scottish.
do not fall under the patterns of traditional logic (also called syllogisms). Ac-
tually this is another kind of inference that contains a conditional expression
of the form:
if B then A
Therefore, A.
Frege adopted this new rule in the system of logic of his Begriffsschrift.
With arbitrary expressions for A and B, the rule became later known as
modus ponens. A logic that evaluates these sorts of expressions is called a
propositional logic.
What is propositional calculus (or sentential calculus)?

A symbolic system of treating compound propositions and their logical re-
lationships. Compound propositions are formed via a set of derivation rules
using standard symbols: ∧, ∨, →, ¬ ; Basic propositions consist of simple,
unanalyzed propositions.
Frege based his propositional calculus on 6 axioms: for all x, y and z:
1 x → (y → x)
2 (x → (y → z)) → ((x → y) → (x → z))
3 (x → (y → z)) → (y → (x → z))
4 (x → y) → (¬y → ¬x)
5 ¬¬x → x
6 x → ¬¬x
Derivations in the propositional calculus were based on two procedures of

substitution and the rule of modus ponens. For the full calculus of predi-
cates, three additional axioms were needed. For all x, y and (propositional
functions) F :
7 (x = y) → (F (x) → F (y))
8 x=x
9 (∀x :: F (x)) → F (y)

Frege presented this new logic in his ‘Begriffsschrift’ in 1879. It consists
of three parts. In the first part he provides a list of inferences from which,
he believes, all truths of logic can be derived. Then Frege demonstrates in
the second part the completeness of his logic (i.e. all inferences that can be
shown to be valid inferences using the techniques of Aristotelian or proposi-
tional logic can also be shown to be valid using only Frege’s laws and rules
of inference). The third part of Begriffsschrift shows that logic alone suffices
to show the validity of certain inferences (about properties that are heredi-
tary in so-called ‘ancestral sequences’). He also showed that mathematical
induction (see section 3.4.3) can be replaced by a principle about ancestral
sequences that depends only on logical laws.
Grundlagen der Arithmetik
Throughout his work Frege developed (as the first) the main thesis of logi-
cism, that mathematics is reducible to logic. But thereto, he had to do more
than developing a new logical symbolism. His next book, ‘Die Grundlagen
der Arithmetik’ (1884), was devoted to the ‘foundations of arithmetic’. In
this book, Frege treated the foundations of arithmetic, based on the concept
of (cardinal) numbers. He put forward the logicist philosophy that arithmetic
could be founded upon logic alone, and he discussed work of others in detail
(see [49, 184-185]). In [31, page 183] we learn more about Frege’s philosophy.
In the introduction of his book Frege announced his three guiding principles:
1) Always to separate sharply the psychological from the logical, the sub-
jective from the objective
2) Never to ask for the meaning of a word in isolation, but only in the
context of a proposition
4.2. FREGE’S WORK 77
3) Never to lose sight of the distinction between concept and object
In his book he presented his own theory of numbers, and wanted to show
that all the truths in arithmetic are derivable from logical laws and defini-
tions alone. He did this by sketching the proof, but not giving the official
Begriffsschrift proofs of the truths of arithmetic. Before Frege could do that
he needed a new version of Begriffsschrift, to accompany the new require-
ments that his formalization of the concept of numbers had, but also to fill
in pieces that were simply missing.
Grundgesetze der Arithmetik
In his next three papers ‘Function and Concept’, ‘On Sense and Meaning’,
and ‘On Concept and Object (1892)’, he introduced all modifications that he
was to make to his language, Begriffsschrift, and his logical system. During
that period he also completed his definitions of the natural numbers and some
of the proofs of simple truths of arithmetic from these definitions and logical
laws. His new logical calculus included a symbolic representation of the truth
value of any given proposition, which provided a shorter notation for many
Begriffsschrift propositions. The calculus also had several other new logical
and arithmetical symbols, one of the most important of them being a notation
for what Frege called the ‘course-of-values’ of a propositional function. The
course-of-values of a propositional function ϕ , denoted by Frege as ε̆ϕ(ε),
denoted the truth value for all possible values of the argument (here ε). We
denote it as cov and define equal course-of-values by cov(f ) = cov(g) ↔ (∀a ::
f (a) = g(a)). In 1893, Frege published the first volume of his ‘Grundgesetze
der Aritmetik’, the ‘Basic Laws of Arithmetic’. It set out the new version of
logic and began the proofs that were to make the project successful. In the
second part Frege wanted to define the natural numbers and some basic laws
governing them and, in the third part, he would define the real numbers and
lay the foundations for expressing analysis in terms of logic. In 1902, when
volume 2 was in press, he received a now famous letter from the English
mathematician and logician Russell (see chapter 5), who pointed out, with
great modesty, a contradiction could be derived in Frege’s system (see section
5.1). This contradiction would later be named after Russell and become
known as ‘Russell’s paradox’.
Hardly anything more unwelcome can befall a scientific writer

than one of the foundations of his edifice be shaken after his work
is finished. I have been placed in this position by a letter of mr.
Bertrand Russell just as printing of the second volume was near-
ing completion . . . .
- The first paragraph of the appendix from Frege’s ‘Grundgesetze

der Aritmetik’
After many letters between the two (see for example [93, pages 124-128]),
Frege modified one of his axioms and explained in an appendix to the book
that this was done to restore the consistency of the system. However with
this modified axiom, many of the theorems of volume 1 do not go through
and Frege must have known this. He probably never realized that even with
the modified axiom the system is inconsistent since this was not shown until
after Frege’s death in 1925, by Leshniewski (see [85]).
The scope of Frege’s Grundgesetze is similar to that of Principia Mathe-

matica (to be discussed in section 7.1), and both aimed at a logistic basis
for mathematics, but with Russell’s theory of types Principia Mathematica
did not contain the paradox. Frege’s contribution to the foundations of ma-
thematics was therefore largely indirect (through Principia Mathematica,
see [49, page 181]). Although Frege attracted only a small audience in his
lifetime, he was a major influence on Peano and Russell, and in the years
thereafter his influence on contemporary philosophy, especially on thought
about language and logic, has become ubiquitous.
In this text I have made extensive use of the excellent books [98] and [97]
about Frege that contain many more references about Frege and his work,
and chapter 4.5 from [31] and chapter 6, section 4 from [49].
Chapter 5
Russell
The fact that all Mathematics is Symbolic Logic is one of the

greatest discoveries of our age; and when this fact has been esta-
blished, the remainder of the principles of mathematics consists
in the analysis of Symbolic Logic itself.
- B. Russell in Principles of Mathematics, 1903
The English logician and philosopher Bertrand Russell (1872-1970) pu-

blished in his long life an incredible number of books on logic, the theory of
knowledge and many other topics. He certainly was one of the most impor-
tant logicians and philosophers of the 20th century.
Russell’s private life, affairs, imprisonment, his social and political cam-
paigns and advocacy of both pacifism and nuclear disarmament are certainly
interesting, but we will not discuss these subjects here (see for more informa-
tion and references on Russell’s life and work [62], [80] and [31, chapter 6, 7,
11 and sections 8.2, 8.3, 8.4, 8.8.3, 8.9.2, 10.1, 10.2.1]). I quote the following
assessment from [73]: “Bertrand Russell had one of the most widely varied
and persistently influential intellects of the 20th century. During most of his
active life, a span of three generations, Russell had at any time more than
40 books in print ranging over philosophy, mathematics, science, ethics, so-
ciology, education, history, religion, politics and polemic. The extent of his
influence resulted partly from his amazing efficiency in applying his intellect
(he normally wrote at the rate of 3,000 largely unaltered words a day) and
partly from the deep humanitarian feeling that was the mainspring of his ac-
79
80 CHAPTER 5. RUSSELL
tions. This feeling expressed itself consistently at the frontier of social change
through what he himself would have called a liberal anarchistic, left-wing,
and skeptical atheist temperament.”
Here, we will focus on Russell’s mathematical contributions to the foun-

dations of mathematics. His contributions relating to mathematics include
his discovery of Russell’s paradox, his defense of logicism (the view that
mathematics is, in some significant sense, reducible to formal logic), his in-
troduction of the theory of types, and his refining and popularizing of the
first-order predicate calculus. Along with Kurt Gödel (see chapter 8), he is
usually credited with being one of the two most important logicians of the
twentieth century. We will look at each of these contributions in more detail.
Russell discovered the paradox which bears his name in 1901, while
working on his ‘Principles of Mathematics’ (1903). The paradox and the
closely related vicious circle principle are discussed in section 5.1. Russell’s
own response to the paradox came with the introduction of types (see chap-
ter 7). Using the vicious circle principle also adopted by Henri Poincaré,
together with Russell’s so-called ‘no-class’ theory of classes, Russell was then
able to explain why the unrestricted comprehension axiom (see section 2.1)
fails: propositional functions, such as ‘x is a set’, should not be applied to
themselves since self-application would involve a vicious circle. On this view,
it follows that it is possible to refer to a collection of objects for which a
given condition (or predicate) holds only if they are all at the same level or
‘type’.
Although first introduced by Russell in 1903 in the Principles, his theory

of types finds its mature expression in his 1908 article ‘Mathematical Logic as
Based on the Theory of Types’ and in the monumental work he co-authored
with Alfred North Whitehead, ‘Principia Mathematica’ (1910, 1912, 1913).
Principia Mathematica and the theory of types will be treated in detail in
chapter 7. The theory admits of two versions, the ‘simple theory’ and the
‘ramified theory’. Both versions of the theory later came under attack. For
some, they were too weak since they failed to resolve all of the known para-
doxes. For others, they were too strong since they disallowed many ma-
thematical definitions which, although consistent, violated the vicious circle
principle. Russell’s response to the second of these objections was to intro-
duce, within the ramified theory, the axiom of reducibility. Although the
81
axiom successfully lessened the vicious circle principle’s scope of application,

many claimed that it was simply too ad hoc to be justified philosophically.
Of equal significance during this same period was Russell’s defense of logi-
cism, the theory that mathematics was in some important sense reducible to
logic. First defended in his Principles, and later in more detail in ‘Principia
Mathematica’, Russell’s logicism consisted of two main theses. The first
is that all mathematical truths can be translated into logical truths or, in
other words, that the vocabulary of mathematics constitutes a proper subset
of that of logic. The second is that all mathematical proofs can be recast as
logical proofs or, in other words, that the theorems of mathematics consti-
tute a proper subset of those of logic.
Like Gottlob Frege, Russell’s basic idea for defending logicism was that
numbers may be identified with sets of sets and that number-theoretic state-
ments may be explained in terms of quantifiers and identity. It followed
that number-theoretic operations could be explained in terms of set-theoretic
operations such as intersection, union, and the like. In ‘Principia Mathema-
tica’ Whitehead and Russell were able to provide detailed derivations of many
major theorems in set theory, finite and transfinite arithmetic, and elemen-
tary measure theory. A fourth volume on geometry was planned but never
completed.
For more information on Russell’s theory of types and about Principia

Mathematica, we refer to chapter 7. In this chapter we used parts of [73]
and [39].
5.1 Russell’s paradox

I hoped sooner or later to arrive at a perfect mathematics which
should leave no room for doubts, and bit by bit to extend the sphere
of certainty from mathematics to other sciences.
- Russell, in [78]
Paradoxes have been known for a long time, but in particular with the
introduction of more formal systems at the end of the 19th century paradoxes
became more influential on the foundations of mathematics. Before we de-
scribe the most famous paradox of Russell, we first define the notion of a
paradox.
What is a paradox? A paradox is a statement which appears self-contradictory

or contrary to expectations, and is also known as an antinomy
In an axiomatic system (see section 6.1) a paradox is a derivation that

leads to a contradictory statement.
A paradox is properly something which is contradictory to ge-

neral opinion; but is frequently used to signify something self-
contradictory [...] Paralogism, by its etymology, is best fitted to
signify an offence against the formal rules of inference.
- De Morgan, in [31, page 310]
In [86], three ‘paradox threats’ are identified: when systems are complex,
formal or designed for computers, there often is not enough intuition to notice
inconsistencies. With the previously described formalizations, the systems
of Cantor (see chapter 2), Peano (see section 4.1), Frege (see section 4.2),
and not to mention Russell himself were at risk. And indeed, in 1902 Russell
discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. The paradox
turned out to be at the basics of mathematics, since it could be formulated in
all the systems mentioned above. We first formulate the paradox in Cantor’s
set theory:
Russell’s paradox: Let R = {x | x

∈ x}. Then R ∈ R ↔ R ∈
/R
5.1. RUSSELL’S PARADOX 83
Russell in 1901 studied Cantor’s work [31, section 6.6.1] and after noting
that some sets belonged to themselves while the rest did not do so, Russell
showed that the set of all sets which do not belong to themselves belongs to
itself if and only if it does not do so - and, by repetition of the argument,
vice versa also. Russell also expressed this paradox in terms of predicates,
and as such first presented his discovery in a letter to Frege (see [93, page
124] and see also the quote on page 78).
Since Peano’s system was based on the set theory of Cantor, also Peano’s
work contained the paradox. In Frege’s work (Grundgesetze der Aritmetik)
self-application was not possible, so R ∈ R was not allowed, but the para-
dox could still be expressed by using Frege’s notion (see page 77) of the
course-of-values of a function. If we define equal course-of-values cov by
cov(f ) = cov(g) ↔ (∀a :: f (a) = g(a)), we can derive the paradox in Frege’s
work as follows (see also [86, page 7] for a slightly different proof):
Define f (x) := (¬∀ϕ :: (cov(ϕ) = x) → ϕ(x)), and let K := cov(f ).
¬f (K)
≡ {def. f}
¬(¬∀ϕ :: cov(ϕ) = K → ϕ(K))
≡ {elim.¬¬}
(∀ϕ :: cov(ϕ) = K → ϕ(K))
≡ {instantiate ϕ with f }
cov(f ) = K → f (K)
≡ {def. K, elim. →}
f (K)
The paradox had a big influence, since it could be formulated in all sys-
tems, and all statements in classical logic were entailed by a contradiction.
In the eyes of many mathematicians (e.g. Hilbert, Brouwer) it therefore

appeared that no proof could be trusted once it was discovered that the
logic underlying all mathematics was inconsistent. Russell’s paradox arises
as a result of naive set-theory’s so-called unrestricted or naive comprehension
axiom (see page 16). Cantor created this axiom with the intuition that any
coherent condition may be used to determine a set. But that means that the
condition ϕ that determines a set V = {x | ϕ(x)} may depend on the whole
set V , i.e. it allows impredicative definitions (see below for the definition of
impredicative). Most attempts at resolving Russell’s paradox have therefore
concentrated on various ways of restricting or abandoning this axiom.
Before we consider the consequences of the discovery of the paradox,

we first take a further look at the nature of the paradox, hereby following
Russell’s own analysis. While writing ‘The Principles’, Russell’s attention
was attracted by what is now known as Cantor’s paradox and (according to
a letter he wrote to the French mathematician Jourdain) found that there
was something wrong with his earlier refutation of Cantor’s paradox (see [29,
section 7]). He removed his earlier refutation from ‘The Principles’ and his
revised diagnosis uncovered a true paradox. As we have already seen, he
summarized this discovery and the reasoning that led thereto in a second
letter to Frege.
After discovering his famous paradox, Russell traced the fallacy back to
what he called the ‘vicious circle principle’. The ‘vicious circle’ that his prin-
ciple is named after, arises from the assumption that a set of objects may
contain members which can only be defined by means of the set as a whole.
Therefore, Russell said that statements are not legitimate and meaningless,
if they contain a set of objects such that it will contain members which pre-
suppose this (total or whole) set of objects. That means a statement is only
legitimate if all propositions it contains refer to already defined sets.
Definition of impredicative: A definition is impredicative if it involves a

set V that has a member v ∈ V whose definition depends on V .1
1
Note that a direct implementation of this definition as a new axiom of set theory is not
possible; We might rephrase the definition as ‘whatever set contains an apparent element,
that element must not be dependant on that set’. This might be implemented by fixing
‘an apparent element’ of a set and then expressing its independency of other elements of
that set. This independency means that, regardless of the nature of the elements of the
In a sense those impredicative definitions are thus circular, and were con-
sidered the cause of antinomies. For more information about impredicativity,
see [57, section 15.3].
Definition of Vicious Circle Principle2 : Definitions, assumptions or

statements involving all of a set must not be a part or an element of that
set. In other words, impredicative definitions should be avoided.
In terms of set theory we can formulate the principle as : No set V is

allowed to contain members v definable only in terms of V , or members v
involving or presupposing V .
Vicious circle fallacies are arguments that are condemned by the vicious
circle principle. Such arguments may not necessarily lead to contradictions
(since fallacious arguments can lead to true conclusions).
In Principia Mathematica (see [31, section 7.2]), Russell assembles a col-

lection of seven different paradoxes, all of which were based on the same
circular type of reasoning, and then he resolved them by making their circu-
larity explicit. We will now mention eight of the most well-known paradoxes,
most of whom originate from the vicious circle principle.
set, the nature of the apparent element remains the same. The ‘nature’ of the elements
can be seen as all the members of that element (or in case the element is an individual,
the nature of the apparent element can be seen as that individual). This leads us to the
following axiom:
(∀X :: (∀x : x ∈ X : x = a → (∀x : x ∈ X ∧ x
= x : x = b(x ) → a ∈ X))). Clearly
this does not avoid the paradox of Russell. We consider a set X:=R ≡ {x | x ∈ / x} and
an element x ∈ R, i.e. we have x ∈ / x. Despite the fact that the set X is ‘too large’, the
axiom does not prohibit the existence of the set X. The axiom tells us x = a → (∀x :
x ∈ R ∧ x
= x : x = b(x ) → a ∈ R). In other words, we can change each element in R
except x and the nature of x should not depend on it. The only thing we know about x is
that x
∈ x and x ∈ R. So to obtain a contradiction we have to show that x ∈ x ∨ x ∈ / R.
Now we can change all x into any value b(x ), but still we will have x ∈ / x and x ∈ R. So
unfortunately this most ‘direct’ attempt to solve the paradox fails.
2
Russell formulated it originally as ‘Whatever involves all of a collection must not be
one of the collection’. Or, as formulated in [49, page 113]: ‘If, provided a certain collection
had a total, it would have members only definable in terms of that total, then the said
collection has no total’. Another formulation of [87] says ‘No entity can be defined in
terms of a totality of which it is itself a possible member’.
1 Russell’s paradox (1903), which we have discussed in this section. The

impredicativity is clear in the definition of the set that contains all sets
that are not members of themselves. There are many popularizations of
this paradox, one of them is from Russell himself (1919) and concerns
the plight of the barber of a certain village who has enunciated the
principle that he shaves only all those persons of the village who do
not shave themselves. The paradox is then formed by the question
‘Does the barber shave himself?’.
2 Burali-Forti’s paradox (1897), which we have discussed in section 3.8.2.

The impredicativity comes from the ordinal number of the naturally
ordered set of all order numbers.
3 Cantor’s paradox, which we have discussed in section 3.8.1. The im-

predicativity comes from the cardinal number of the set of all sets.
4 The liar’s paradox: We quote from [49, page 127]: “If a man says ‘I
am lying’, his utterance is self-contradictory, and it cannot be either
true or false. The oldest form of this particular paradox, in the words
of Principia Mathematica, is that of Epimenides the Cretan, ‘who said
that all Cretans were liars, and all other statements made by Cretans
were certainly lies’.”.
5 Richard’s paradox: The French schoolteacher Jules Richard (1862-

1956) published a paradox in [74] in 1905. He considered a set V of
all non-terminating decimals that can be defined in a finite number of
words. By arranging V as a sequence, and applying Cantor’s diagonal
argument to the members of V , a different but non-terminating decimal
was produced, defined in a finite number of words.
6 Paradox of definitions. Again we quote from [49]: “The possible defi-

nitions of specific ordinal numbers can be arranged in a sequence, and
there are therefore at most ℵ0 of them. But the totality of ordinal
numbers is not denumerable, and so there exist ordinal numbers which
cannot be individually defined. Among such indefinable ordinals there
is a least, and thus it appears that the description ‘the least indefinable
ordinal’ yields a definition of an entity that cannot be defined.”.
7 Berry’s paradox: “The least integer not nameable in fewer than nine-
teen syllables” is itself a name that contains only eighteen syllables.
8 The Grelling-Nelson paradox: The German philosopher Kurt Grelling

(1886-1942) published with his friend Leonard Nelson (1882-1927) in
1908 a paradox. As described in [31, page 336]: “Some words can be
predicated of themselves: in English, ‘word’ is a word, ‘noun’ is a noun,
and so on. This property is called ‘autological’, and is obviously itself
autological. Other English words are not autological; ‘German’, say, or
‘verb’. They are called ‘heterological’ - but this word is heterological if
and only if it is not so.”.
The first three paradoxes are logical paradoxes that can be formulated
within Cantor’s set theory. The remaining five are mainly paradoxes of nam-
ing, they are of a semantic kind. All these paradoxes have stimulated funda-
mental research, and especially Russell’s paradox that revealed the vicious
circle principle and first showed the need for a theory of types or other re-
striction of the power of the comprehension axiom.
5.2 Consequences and philosophies

Perhaps the greatest paradox of all is that there are paradoxes in
mathematics.
- E. Kasner and J. Newman quoted in [46]
The various proposals to overcome this paradox led to various theories.

One proposal was to reconstruct set theory on an axiomatic basis (this
axiomatic method was first suggested by Hilbert, see section 6.1) sufficiently
restrictive to exclude the paradoxes. Hilbert and other formalists had the
basic idea to allow the use of only well-defined and finitely constructible
objects, together with rules of inference that were deemed to be absolutely
certain.
The mathematician Zermelo in 1908 as first did an attempt to formulate

proper axioms for set-theory such that the paradox is not deducable, but
most other parts of set-theory are. This attempt was successful and, after
a refinement by the mathematician Fraenkel, led to the ZF axiom system
(see section 5.3) which is still the most accepted basis today. Subsequent
refinements to ZF have been made by Skolem, and later by the three mathe-
maticians von Neumann, Bernays and Gödel (see section 8.5).
Russell’s own response to the paradox came with the introduction of his
theory of types in his Principia Mathematica (see section 5.4). Russell al-
ready laid out a first version of his theory to eliminate the paradoxes in
1908. Since self-application (R ∈ R) caused a contradiction, he decided to
suppress this. With this approach he assigned types to variables (as types
he took sets) and allows expressions such as x ∈ y only if the type of x
is one less (in some order) than the type of y. The outlawing of impredi-
cative definitions seemed a solution to the known paradoxes in set theory.
But it turned out there are essential and accepted parts of mathematics that
contain impredicative definitions. This was a serious problem to Russell’s
solution, despite the fact that many instances of impredicative definitions in
mathematics could be circumvented. We quote from [22, page 265]: “In 1918,
the German mathematician Hermann Weyl (1885-1955) tried to construct as
much parts of analysis as possible from the natural number system without
the use of impredicative definitions. Although he succeeded in obtaining a
considerable part of analysis, he was unable to derive the important theorem
5.2. CONSEQUENCES AND PHILOSOPHIES 89
that every nonempty set of real numbers having an upperbound has a least
upperbound”.
Other attempts towards a solution for the paradoxes of set theory focus on
the foundations of logic. Luitzen Brouwer and the intuitionists took this
approach and tried to prevent the paradoxes by denying the principle of the
excluded middle (which states that any mathematical statement is either
true or false). Brouwer first attacked the logical foundations of mathematics
in his doctoral thesis in 1907; This formed the beginning of the Intuitionist
School. The intuitionists had the basic idea that one cannot assert the exis-
tence of a mathematical object unless one can also indicate how to go about
constructing it.
In the period after the discovery of the paradoxes, we distinguish three

main philosophies of mathematics: logicism, intuitionism and formalism.
What is Logicism? A school of mathematical thought which holds the

thesis that mathematics is a part of (or a branch of) logic.
Logicists contend that all of mathematics can be deduced from pure logic,
without the use of any specifically mathematical concepts, such as number or
set. The first ideas date back to Leibniz (1616) and the actual reduction of
mathematics to logic was started by Dedekind (1818) and Frege (1884-1903)
and later by Peano, and Whitehead and Russell (in Principia Mathematica
1910-1913).
What is Intuitionism? A school of mathematical thought by the 20th cen-

tury Dutch mathematician L.E.J. Brouwer (1881-1966) that contends that
the primary objects of mathematical discourse are mental constructions go-
verned by self-evident laws.
Intuitionists have challenged many of the oldest principles of mathema-

tics as being non-constructive (and hence meaningless). They proposed that
a proof in mathematics should be excepted only if it constructed the mathe-
matical entity it talked about, and not if it merely showed that the entity
‘could’ be constructed or that supposing its non-existence would result in
contradiction.
Brouwer had the fundamental insight that such nonconstructive argu-

ments will be avoided if one abandons a principle of classical logic (which
lies for example behind De Morgan’s laws). This is the principle of the ex-
cluded third (or excluded middle), which asserts that for every proposition
ϕ, either ϕ or ¬ϕ; or equivalently that, for every ϕ, ¬¬ϕ implies ϕ. This
principle is basic to classical logic and had already been enunciated by Aris-
totle, though with some reservations, as he pointed out that the statement
“there will be a sea battle tomorrow” is neither true nor false.
Because of the weight it places on mental apprehension through construc-

tion of purported mathematical entities, intuitionism is sometimes also called
constructivism. A still more severe form of constructivism which we will not
further discuss is strict finitism, in which one rejects infinite sets. More in-
formation on intuitionism can be found in [60].
What is Formalism? A school of mathematical thought introduced by the

20th century mathematician David Hilbert, which holds that all mathematics
can be reduced to rules for manipulating formulas without any reference to
the meanings of the formulas.
Formalists contend that it is the mathematical symbols themselves, and

not any meaning that might be ascribed to them, that are the basic objects
of mathematical thought. Hilbert’s program, called formalism, was to con-
centrate on the formal language of mathematics and to study its syntax. A
statement should be a metatheorem, that is a theorem provable within the
syntax of mathematics.
These three philosophies do not necessarily contradict each other, and

all philosophies are still advocated today. Whether the logicist thesis has
been established seems to be matter of opinion. Though successful, it can
be questioned on the ground that the systematic development of logic pre-
supposes mathematical ideas in its formulation. The intuitionists succeeded
in rebuilding large parts of present-day mathematics, but a large part is still
wanting, making intuitionist mathematics less powerful and in many respects
much more complicated than classical mathematics. These are serious ob-
jections to the intuitionistic approach, but it is generally conceded that its
methods do not lead to contradictions, and some hope for a new intuitionist
reconstruction of mathematics carried out in a different and more successful
5.2. CONSEQUENCES AND PHILOSOPHIES 91
way. Unfortunately for the formalists, a consequence of Gödel’s incomplete-

ness theorem (see chapter 8) is that the consistency of mathematics can be
proved only in a language which is stronger than the language of mathema-
tics itself. Yet, formalism is not dead - most pure mathematicians are tacit
formalists, but the naive attempt to prove the consistency of mathematics in
a weaker system had to be abandoned. From [11, item from Paul Bernays]
we learn that most mathematicians of all three philosophies are also philo-
sophical realists: “While no one, except an extremist intuitionist, will deny
the importance of the language of mathematics, most mathematicians are
also philosophical realists who believe that the words of this language denote
entities in the real world. Following the Swiss mathematician Paul Bernays
(1888-1977), this position is also called Platonism, since Plato believed that
mathematical entities really exist.”. For more information about realism, see
[57].
5.3 Zermelo Fraenkel

5.3.1 Axiomatic set theory
After the discovery of Russell’s paradox, it became clear that set theory
needed a new and more rigorous basis. Hilbert’s proof theory, that will be
treated in more detail in chapter 6.1, offered a way to put set theory on firm
and hopefully consistent grounds. The so-called ideal calculus was a first
formalization of Cantor’s set theory, but it lacked the preciseness of Hilbert’s
later theories and was inconsistent because it still contained in some form the
(naive) comprehension principle (see page 16). The first real axiomatization
of set theory was given in 1908 by the German mathematician Ernest Zermelo
in [101]. The attitude adopted in his axiomatic development of set theory
is that it is not necessary to know what ‘sets’ are and the ‘things’ that are
its elements, nor what the ‘membership relation’ means [49, see page 288,
paragraph 1]. Zermelo instead postulated a domain B of abstract objects and
represented the elements or ‘things’ of this domain by the letters a, b, c, . . ..
He then defined the primitive notions of equality and membership: a = b
states that ‘a’ and ‘b’ designate the same ‘thing’. a ∈ b is defined on the
domain B and if a ∈ b holds, we call b a set and a an element of this set. Thus
some, but not necessary all objects of B are sets. The assumptions adopted
about these notions are called the axioms of the theory. Its theorems are the
axioms together with the statements that can be deduced from the axioms
using the rules of inference (see also section 6), for example by a system of
logic. Criteria for the choice of axioms have been identified by several people
(see Hilbert’s theory in section 6, or [49, last sentence of page 287]). The
most accepted criteria (more formally defined in chapter 6) include:
1. Consistency of the system (it should be impossible to derive both a
statement and its negation, in other words the paradoxes should be
avoided).
2. Plausibility (the axioms should be in accord with intuitive beliefs about
sets, see [60]).
3. Completeness (richness of the theory: the desirable results of Cantorian
set theory ought to be derived as theorems).
In the next paragraph we will present the set of axioms that Zermelo has
chosen and that formed the basis for all future axiomatizations of set theory
5.3. ZERMELO FRAENKEL 93
(see also section 8.5).
5.3.2 Zermelo Fraenkel (ZF) Axioms

Zermelo formulated his axiomatic system in 1908, the extensions of Fraenkel
are from 1922. In the same year (1922) the Norwegian mathematician Skolem
(1887-1963) proposed a formal language for formulating the theory.
Zermelo noted that the sets involved in a derivation of the paradoxes are
very large3 (for Cantor’s paradox it is the set of all sets (see section 3.8.1),
for Russell’s paradox it is the set of all sets which are not members of them-
selves (see section 3.8.2), and for the Burali-Forti paradox (see section 3.8.2)
it is the set of all well-orderings). Therefore he wanted to restrict the size of
sets, and he changed the (naive) comprehension principle into his separation
axiom, such that the paradox could no longer be derived:
Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x)))

For every set z and definite4 property ϕ of sets there exists a set whose ele-
ments are exactly those of z having the property ϕ.
There are also certain limitations on the property ϕ (i.e. it should be de-
finite) that we will mention later in section 8.5. We show that the standard
derivation of Russell’s paradox cannot be applied when the naive compre-
hension axiom is replaced by the separation axiom.
Let R = {x | x ∈ Z ∧ x ∈
/ x}
R∈R↔R∈Z ∧R∈
/R
→R∈
/ R, contradiction.
R∈
/R↔R∈
/ Z ∨R∈R
3
The term proper class is sometimes used to refer to these ‘excessively large’ sets; all
other sets are then referred to as improper classes. This means all sets are classes but not
every class is a set. A class that is not a set is called a proper class.
4
See section 8.5 for the definition of the concept of definiteness.
←R∈
/Z
In both equations above we can only conclude that R ∈ R ↔ R ∈ / R if

we know that R ∈ / Z. Since we cannot directly conclude (or refute) R ∈ Z,
Russell’s derivation of his paradox does not apply.
However, this fact alone does not guarantee that there does not exist a
paradox, as claimed in some articles, but merely that the separation axiom
does not permit the construction of paradoxical sets with elements defined
in terms of the sets themselves. But until consistency is proved, there might
be other less obvious ways to construct a paradox.
We now give all of the ZF axioms that constitute set theory. The first
seven axioms are those that were originally formulated by Zermelo. Axiom
8 and 9 were later added by Fraenkel and von Neumann respectively. The
axioms 1 through 8 are the original set of the Zermelo-Fraenkel axioms.
In the definitions below we use several shorthand notations. If we wish

however we can express these definitions in full detail, such that the notation
of each expression does not depend on previous axioms. For example, in
axiom 8 we used the ∃! to denote that there is exactly one y, and in axiom
9 we used the symbols ∩ and ∅, and in axiom 6 we used ⊆ to express x ⊆ z
as a shorthand for (∀y :: y ∈ x → y ∈ z). The separation and substitution
axioms are actually axiom schemes.
The Zermelo-Fraenkel axioms:
1. Extensionality axiom (or axiom of determination):

(∀x, y, z :: (z ∈ x ↔ z ∈ y) → x = y)
Sets are uniquely determined by their members,or to be exact: if every
element of a set x is at the same time an element of y, and conversely,
then x = y.
2. Axiom of the empty set:

(∃x∀y :: y ∈
/ x)
There is an (improper, see also footnote on page 93) set, the ‘null’ or
‘empty’ set, which contains no elements at all.
3. Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y.
For every set z there exists a set y whose elements are exactly those of
z having the property ϕ.
4. Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
5. Sum-set axiom or Union axiom:

(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))
For every set z there exists a set y whose elements are exactly those
objects occurring in at least one element of z.
6. Power set axiom

(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)
For every set z there is a set y that includes every subset of x.
7. Axiom of infinity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z))
There exists a successor set.
8. Axiom of replacement or axiom of substitution (by Fraenkel):

(∀x∃!y :: ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x, y))))
The image of a set under an operation ϕ (functional property) is again
a set.
9. Axiom of foundation or axiom of regularity (by von Neumann):

(∀a ::
= ∅ → (∃b :: b ∈ a ∧ b ∩ a = ∅))
Every non-empty set is disjoint from at least one of its elements.
Theorem: (from [49, chapter 11]) The domain B itself (see page 92) is not
a set.
Proof: Suppose V is any given set. Then5 , V has a subset W that consists of
those elements of V that are not members of themselves. But then W is not
an element of itself (because in that case we would have W ∈ W , while W
5
Since the property x ∈
/ x is definite. See section 8.5 for the definition of the concept
of definiteness.
consists of elements that are not members of themselves). But if W would

be an element of V − W , we would also have W ∈ W . This means that W
is not a member of V . But V is certainly in B, and therefore B is not the
same as V . Thus B cannot coincide with any set at all.
The theory is not complete, since many statements are independent of

ZF. Independent of the previous axioms, the following two statements have
a more dubious status (and are not part of standard ZF ):
10. Axiom of choice (AC):

(∀x :: (∃f : f is a function : Dom(f ) = x − {∅} ∧ Ran(f ) ⊂ A ∧ (∀a :
a ∈ Dom(f ) : f (a) ∈ a)))
Every set x has a choice function.
Definition of choice function: A function f is called choice function for

the set V := Dom(f ) = V − {∅} ∧ (∀v : v ∈ Dom(f ) : f (v) ∈ V )
11. Generalized Continuum Hypothesis(GCH):
For any cardinal ℵr , {0, 1}ℵr = ℵr+1
In 1908 Felix Haussdorf proposed this generalization of CH. Another

formulation of this axiom and more information are given in section 3.6. In
the remainder of this section, we will give a short explanation of the nature
of the other axioms. For more detailed information, we refer to section 8.5
and to the rich literature on set theory that is available (for example [17],
[24], [49, chapter 11], [28]).
The axioms are not minimal. For example, as we have already seen in
section 2.26 , the axiom of the empty set can be deduced from the separation
axiom. We also have empty set axiom + substitution axiom ! separation
axiom. We have also seen in section 2.2 how we can define basic operations
with the extensionality and separation axioms. The pairing, sum and pow-
erset axioms, together with the extensionality axiom, ensure uniqueness of
the pairs, sums and powersets of sets. With these axioms alone we can al-
ready create an infinite number of axioms. However, each set constructed
6
The existence of the empty set in section 2.2 was actually derived from the compre-
hension principle but the result can similarly be obtained from the separation axiom.
with axioms 1 to 6 only has a finite number of elements. It is the infinity

axiom that we need to create infinite sets. These sets are not unique, but the
smallest successor set, denoted ω, is unique. We call its elements the natural
numbers. With this axiom we can now also prove the principle of induction
for ω (see section 3.4.3). The substitution axiom says that whenever ϕ is a
property of sets, such that to every x there is exactly one y for which ϕ(x, y),
and a is a set, then there exists a set, the elements of which are exactly
those y for which an x ∈ a exists such that ϕ(x, y). The foundation axiom
says that each non-empty set has epsilon-minimal elements (see below). An
implication of this axiom is that there is no function f defined on ω such
that (∀i : i ∈ ω : f (i + 1) ∈ f (i)). For a motivation and analysis of the role
of the foundation axiom we refer to [17, section 2.1].
Definition of epsilon-minimal:
An element b ∈ a is epsilon-minimal in a := b ∩ a = ∅
Another corollary of the foundation axiom is that there is no set which

has itself as its only element. Note that to prevent the paradoxes we need
the separation axiom, not the foundation axiom.
The origin of the axiom of choice was Cantor’s recognition of the impor-
tance of being able to well-order arbitrary sets; i.e., to define an ordering
relation for a given set such that each nonempty subset has a least element.
The virtue of a well-ordering for a set is that it offers a means of proving
that a property holds for each of its elements by a process (transfinite in-
duction) similar to mathematical induction. Zermelo (1904) gave the first
proof that any set can be well-ordered. His proof employed a set-theoretic
principle that he called the axiom of choice, which, shortly thereafter, was
shown to be equivalent to the so-called well-ordering theorem. One form of
this principle is expressed as the axiom of choice. A choice function for a set
A ‘chooses’ an element from each non-empty subset in A. If x is a nonempty
set the elements of which are nonempty sets, then there exists a function f
with domain y such that for member a of y, f (a) ∈ a. For a more detailed
discussion of the axiom of choice we refer to [17, section 2.9].
Intuitively, the axiom asserts the possibility of making a simultaneous choice
of an element in every nonempty member of any set; this guarantee accounts
for its name. The assumption is significant only when the set has infinitely
many members. Zermelo was the first to state explicitly the axiom, although
it had been used but essentially unnoticed earlier. It soon became the subject
of vigorous controversy because of its unconstructive nature. There are a few
mathematicians who feel that the use of the axiom of choice is improper, but
to the vast majority it, or an equivalent assertion, has become an indispens-
able and commonplace tool. For this discussion of the axiom of choice we
have used [63], [77] and [11].
A discussion of the Generalized Continuum Hypothesis can be found in
section 3.7.
Chapter 6
Hilbert
The further a mathematical theory is developed, the more harmo-

niously and uniformly does its construction proceed, and unsus-
pected relations are disclosed between hitherto separated branches
of science.
- Hilbert, quoted in [76]
David Hilbert (1862-1943) was a German mathematician who reduced

geometry to a series of axioms and contributed substantially to the esta-
blishment of the formalistic foundations of mathematics. His first work was
on invariant theory and in 1888 he proved his famous Basis theorem (see
[5]). After that he did significant work in the areas of algebraic number
theory, and published his ‘Zahlbericht’, or ‘Report on the theory of numbers’
in 1897. In 1899 he published the ‘Grundlagen der Geometrie’ (to appear
in English as ‘The foundations of Geometry’ in 1902), which contained (see
[31, section 4.7.2]) what would become a widely accepted set of 21 axioms
for Euclidian geometry and an analysis of their significance. This axiomatic
method that Hilbert used (for geometry, but its application and concept
is more general and can be used far beyond the domain of geometry, see
also [57, section 14.7]) will be treated in section 6.1. A substantial part of
Hilbert’s fame rests on a list of 23 mathematical problems he outlined in
1900, and posed as a challenge for the next century. Some of these problems
were related to the foundations of mathematics (see section 6.2). In 1905
Hilbert attempted to lay a firm foundation of mathematics by proving its
consistency, resulting in two volumes of ‘Grundlagen der Mathematik’ that
99
100 CHAPTER 6. HILBERT
were intended to lead to a proof theory. Despite that in 1931 Kurt Gödel
showed this goal to be unattainable (see chapter 8), the work Hilbert had
done on the foundations of mathematics nevertheless remained influential to
the development of logic. Hilbert’s work on integral equations in about 1909,
(see [45]) led to research in functional analysis and established the basis for
his work on infinite-dimensional space, later called Hilbert space (see [22,
page 232]). When Hilbert was made an honorary citizen of Göttingen he
gave an address which ended with six famous words, showing his enthusiasm
for mathematics and optimism for solving mathematical problems: “There
are absolutely no unsolvable problems. Instead of the foolish ignorabimus
[Latin for ‘the ignorant’], our answer is on the contrary: Wir müssen wissen,
Wir werden wissen” [We must know, We shall know].
6.1. HILBERT’S PROOF THEORY 101
6.1 Hilbert’s proof theory

Hilbert formalized mathematical theories in order to turn them into well-
defined objects of discussion, thus making possible the new kind of investi-
gation to which he gave the new name meta-mathematics. Hilbert was the
first who emphasized that strict formalization of a theory involves the total
abstraction from the meaning, the result being called a formal system or
formalism. In its structure, a formalized theory is no longer a system of
meaningful propositions but one of sentences as sequence of words, which
in turn are sequences of letters (a symbolic language). Hilbert’s method of
making the formal system as a whole the object of mathematical study is
called metamathematics or proof theory.
What is metamathematics? The study about mathematics itself (with

respect to formalized mathematical systems, metamathematics thus consists
of statements about the signs and formulas occurring within axiomatic sys-
tems). One of the primary goals of metamathematics is to determine the
nature of mathematical reasoning
After Hilbert presented an axiomatic development of geometry in ‘Grund-

lagen der Geometrie’ (1899), he devoted himself to the much greater task of
applying his new metamathematic method to pure mathematics as a whole.
Or, as Hilbert wrote in 1917: “Since the examination of the consistency is a
task that cannot be avoided, it appears necessary to axiomatize logic itself
and to prove that number theory and set theory are only parts of logic”.
Hilbert took a formal(istic) approach to achieve this logistic goal (logicism
is the study that uses logic as the basis of mathematics and formalists at-
tempted to successfully axiomatize mathematics, see also the philosophies in
section 5.2). Thereto Hilbert identified three properties that an axiomatic
system should have: it should be decidable, complete and consistent. In or-
der to define these notions, we first have to make precise some other concepts.
Definition of an axiom:
A proposition that is regarded as true without proof
Definition of free variable:

A variable that is not bound within the scope of a quantifier
An axiom that does not contain any variables is also called an axiom
statement, an axiom with free variables is called an axiom scheme and each
free variable is to be quantified over all well-formed formulas.
Definition of statement (or sentence): A well-formed formula with no

free variables
Of the systems that Hilbert’s proof theory applies to, we here consider
those susceptible to Gödel’s incompleteness theorem (that will be presented
in chapter 8).
Definition of an STGA language: A language1 L is Susceptible to

Gödel’s argument (STGA) if it consists of:
1 E, a denumerable set of (well-formed) expressions (also called formulas)
of L
2 S ⊆ E, sentences of L (i.e. with no free variables)
3 P ⊆ S, provable sentences of L
4 R ⊆ S, refutable sentences of L
5 H ⊆ E, predicates of L (i.e. with free variables, H ∩ S = ∅). For
convenience, we here assume predicates to have exactly one variable.
6 A function ϕ : E × N → E, ϕ assigns to every E ∈ E and n ∈ N an
expression E(n) such that for every H ∈ H we take for E and every
n ∈ N, H(n) is a sentence (H(n) ⊆ E hence, H(n) ⊆ S).
We can think of such a function ϕ as a substitution function. Infor-
mally, the sentence H(n) expresses the proposition that the number n
belongs to the set names by H.
The following set is the only one that depends on a semantic

interpretation of the expressions, and is normally determined by a
model that we accept as representing the truth. The model should
be distinguished from the set of derivation rules that (syntactically
or mechanically) determines whether sentences are provable or
1
Sometimes also called system, since it not only defines a language but also includes
the (dis)provability and truth of expressions.
refutable. It is important to realize that the truth of a sentence

is not the same as the provability of that sentence.
7 T ⊆ S, true sentences of L. This set can be determined by a model

(see page 107)
First, we give an intuitive explanation of this definition: In most parts of

mathematics, not every sequence of symbols is meaningful or useful. There-
fore we only consider the so-called well-formed formulas E. Some of these
formulas (also called propositions) do not contain free variables, we name
them sentences (S). Some of them are provable from the axiomatic system
(i.e. they can be derived from the axioms and derivation rules of the axiomatic
system), and are elements of P. Others are refutable, also called disprovable
(i.e. their negation can be derived from the axioms and derivation rules of
the axiomatic system) and are elements of R. These notions only depend on
whether the sentence is derivable from the axiomatic system and are inde-
pendent from the truth of the sentence. We call the set of true sentences T
(the other sentences are false). Other formulas have free variables, i.e. they
are functions. We call them predicates (H). We also assume there exists a
function ϕ that assigns to every expression H ∈ H and natural number n a
sentence H(n).
What is an Axiomatic System? An axiomatic system (sometimes also

called formal axiomatic system) is a logical system that gives rise to an STGL
language and has an explicitly stated finite set of axioms from which provable
sentences can be derived (using a finite set of derivation rules)
The set of axioms and derivation rules determines which sentences of L

are provable or not. The axiomatic system also contains a syntax definition
that determines the well-formedness of expressions of L. Normally, the syn-
tax definition of an axiomatic system consists of an alphabet of symbols and
a set of rules. We show that this notion of an axiomatic system gives rise
to a language that falls under the category of STGL languages. Such an
axiomatic system A is often defined as follows:
Definition of axiomatic system: An axiomatic system A consists of:
• An alphabet Σ, consisting of a finite number of constants (with their

arities) and variables.
• A recursive definition of a syntax, determining which formulas are well-

formed formulas.
• An initially determined and fixed set of axioms and derivation rules

(also called transformation rules or rules of inference).
The recursive definition over the given alphabet gives us the set of ex-
pressions. The variables enable us to form predicates. The set of axioms and
derivation rules let us prove or refute sentences. Ideally, we want all sen-
tences that are provable coincide with the sentences we intuitively consider
true (P = T ) and the refutable sentences coincide with those we consider
false. We call a system with this property correct. We now give an example
of a definition of a simple axiomatic system.
Example: axiomatic system A1
• Σ = {∨2 , ¬1 , (0 , )0 , ∀2 , x0 , y 0 , R02 , true0 , f alse0 }

The numbers that are written in superscript denote the arity of the
relations; a constant or variable is a 0-ary relation.
• ϕ is a well-formed formula if it
0. is one of the constants true and false.

1. is an atomic formula Ri (x1 , . . . , xj ), with Ri a relation with arity
j, and x1 , . . . , xj variables or constants.
2. has the form of ϕ1 ∨ ϕ2 , ϕ1 ∧ ϕ2 , (ϕ1 ), ¬ϕ1 , ∀xi (ϕ1 ), where ϕ1 and
ϕ2 are smaller formulas and xi is some variable from Σ.
• For all variables x, variables or constants c and d and well-formed for-

mula ϕ,
R0 (c, d) ∀x(ϕ)
true f alse
¬f alse ¬true
true f alse
true ∧ ϕ f alse ∧ ϕ
ϕ f alse
true ∨ ϕ ϕ ∨ true
true ϕ
The STGA language L that can be constructed2 on the basis of A1 ,

denoted by LA1 , consists of the following parts:
1. E is the set of usual mathematical predicates formed by the symbols of

the given alphabet (so E includes the binary relation R0 ).
2. S is the set of those expressions without free variables (i.e. proposi-

tions).
3. The provable sentences P are those that are true from the derivation
rules. For example, ¬ false ∧ R0 (false, true) → true ∧ R0 (false, true)
→ true ∧ true → true.
4. The refutable sentences R are those that are false from the derivation
rules. For example, ∀y (false ∨ y) ∧ true → false ∧ true → false.
5. The predicates are those expressions with one free variable.
6. For each such predicate we can replace the free variable by a formula
that is represented3 by a natural number, and obtain a proposition.
7. The definition of an axiomatic system does not include a model. If we

think of the standard logic that is used in practice, we can see that for
all formulas except those with an ∀-symbol, the formulas are derivable
if and only if they are true.
We now introduce some concepts related to STGA languages and axiomatic

systems. We assume that A is an axiomatic system that gives rise to an
STGA language L.
Definition of derivable: A formula ϕ is derivable in L := ϕ ∈ P.

A formula ϕ is derivable from an axiomatic system A, notation A ! ϕ :=
there is an axiom ai of A and a sequence of formulas ϕ1 ,. . . , ϕl such that
ϕ1 = ai and ϕl = ϕ and each ϕi follows from the preceding formulas and the
axioms of A by the derivation rules of A.
2
Sometimes it is also said that an axiomatic system A1 gives rise to a language LA
3
An example of such a bijective function between a predicate and a set of natural
numbers will be given in section 8.2.
We call the sequence of formulas ϕ1 , . . . , ϕl in a derivation of the state-

ment ϕ a formal proof π of the statement ϕ. When A ! ϕ, we also write
ϕ ∈ A.
Example:
A1 ! ¬ false ∧ R0 (false, true)
A1
! ∀x)x¬ (since the formula is not well-formed, i.e. does not follow
to be true from the syntax definition)
A1
! ∀y (false ∨ y) ∧ true (since it does not follow from the derivation
rules, i.e. is a refutable sentence)
Hilbert proposed a program to reformulate all mathematics as a formal

axiomatic theory, and this theory has to be proved to be consistent, i.e. free
from contradiction. The standard method that was used to prove the consis-
tency of axiomatic systems was to give a ‘model’. A model for an axiomatic
theory is simply a system of objects, chosen from some other theory and
satisfying the axioms.
This means we can relate axiomatic systems to existing systems by means

of a model, also called interpretation or structure. A model of a formal
axiomatic theory is a well-defined mathematical system with the particular
structure that is characterized by the theory.
Definition of universe: Set of values that variables of an axiomatic system

may take
Definition of a model: A universe together with an assignment of n-ary

relations to n-ary constants, and a corresponding assignment of the variables.
We define a model M for an axiomatic system A by : M = (U, P1 , . . . , Pk )

with U a universe for A and P1 , . . . , Pk the relations corresponding to symbols
R1 , . . . , Rk of A. If a formula ϕ is true in the model M (i.e. by interpretation
of the relation symbols by the corresponding relations), notation M |= ϕ, we
say that M is a model of ϕ.
Example: Let M1 = (N, ≤) be a model for axiomatic system A1

M1 |= ∀x∀y(x ≤ y ∨ y ≤ x)
M1
|= ∀x∀y(x ≤ y ∧ y ≤ x)
Note that instead of using R1 for the relation symbol, we immediately took
the interpretation ≤.
A theory T h of a model M, notation T h(M) is the set of true statements

in the language of that model.
Definition of a theory: T h(M) := {ϕ | ϕ is a statement and M |= ϕ}
So now we can say that Hilbert was looking for an axiomatic system for
which logic can be a model. Hilbert proposed such an axiomatic system to
have the properties of consistency, completeness and decidability. We will
now introduce these concepts, along with some other properties of axiomatic
systems. Since the properties of an axiomatic system A give rise to corre-
sponding properties in the language LA , we here distinguish in each definition
between the property of a language and of an axiomatic system.
Definition of decidability:
A language L is decidable := (∀ϕ :: (ϕ ∈ P ∨ ϕ ∈ R)).
An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that de-
cides in a finite number of steps whether (or not) A ! ϕ) (see also [49, page
270])
Definition of consistency:
A language L is consistent := ¬(∃s : s ∈ S : s ∈ P ∧ s ∈ R), i.e. P ∩ R = ∅
or no sentence is both provable and refutable in L.
An axiomatic system A is consistent := ¬(∃ϕ :: A ! ϕ ∧ A ! ¬ ϕ) (i.e. it is
not possible for any formula ϕ, to derive both ϕ and ¬ϕ) (see also [49, page
240])
A language L is inconsistent if is not consistent. Clearly, L is inconsistent

if P and R are not disjoint. Note that consistency and decidability do not
refer to T , but only concern P and R. The following definitions of com-
pleteness, soundness and correctness also depend on the truth set T (and
therefore on the model that determines that truth set).
Definition of completeness:
A language L is complete for a model M := (∀ϕ :: M |= ϕ → ϕ ∈ P).
An axiomatic system A is complete for model M :=
(∀ϕ :: M |= ϕ → A ! ϕ) (i.e. all true statements in the model are deriva-
ble/provable)
A language L is incomplete if it is not complete. Note that the statement

(∀ϕ :: M |= ϕ → A ! ϕ) is equivalent with (∀ϕ :: A
! ϕ → M
|= ϕ), i.e. all
statements ϕ that are not derivable/provable, are also not true in the model.
Definition of soundness:
A language L is sound for a model M := (∀ϕ :: ϕ ∈ P → M |= ϕ).
An axiomatic system A is a sound axiomatization for a model M :=
(∀ϕ :: A ! ϕ → M |= ϕ) (i.e. if a statement ϕ is derivable/provable, it is
true in the model)
Definition of correctness:
A language L is correct for a model M := P ⊆ T ∧ R ∩ T = ∅ (i.e. every
provable sentence is true and every refutable sentence is false (not true)).
An axiomatic system A is correct for a model M := A is sound for M and
A is complete for M
Theorem: If L is correct, it is consistent.

Proof: This follows directly from the definitions of correctness and consis-
tency because if P is a subset of T and T is disjoint from R, then P must
be disjoint from R.
6.2 Hilbert’s 23 problems

Who of us would not be glad to lift the veil behind which the future
lies hidden: to cast a glance at the next level of our science and
at the secrets of its development during future centuries? What
particular goals will there be toward which the leading mathema-
tical spirits of coming generations will strive? What new methods
and new facts in the wide and rich field of mathematical thought
will the next centuries disclose?
- D. Hilbert, in the opening of his speech to the 1900 Congress

in Paris
In 1900 Hilbert outlined his list of 23 mathematical problems to the In-

ternational Congress of Mathematics in Paris, which he urged upon the at-
tention of his contemporaries. His famous address was important and still
today influences and stimulates mathematical research all over the world.
It was not only a collection of problems, but it was also his philosophy of
mathematics (see also the formalist viewpoint in section 5.2) and a collec-
tion of problems important to that philosophy. Many of the problems have
since been solved, and each solution was a noted event (or even a mathema-
tical breakthrough). Some of these problems however remain unsolved till
this day. In 2000, in the footsteps of Hilbert, the Clay Mathematics Insti-
tute (see http://zax.mine.nu/interests/questions/clay.htm) has made a new
list of 7 (for a large part mathematical) problems to be solved in this century.
Among those problems is one of the original problems (number 8) of

Hilbert. It requires a solution to the Riemann hypothesis, which is usually
considered to be the most important unsolved problem in mathematics. We
mention some of the original problems that are related to the foundations
of mathematics. For a complete source of information on the 23 (or 25?,
see [32]) original publications of Hilbert, see the articles [41] and [40], also
available online [42].
6.2. HILBERT’S 23 PROBLEMS 111
• Problem 1: Cantor’s problem of the cardinal number of the continuum.

This problem is also known as the Continuum Hypothesis and exten-
sively covered in section 3.7.
• Problem 2: The consistency of the axioms of arithmetic. The question

is whether it can be shown that the axioms on which arithmetic is based
are consistent. Gödel later showed that any formal system that contains
arithmetic (see chapter 8) can never prove its own consistency. Another
metamathematical argument might exist, that cannot be expressed in
the system, but can prove its consistency.
• Problem 6: Mathematical treatment of the axioms of physics, asks to

treat in the same manner, by means of axioms, those physical sciences
in which mathematics plays an important part; in the first rank are the
theory of probabilities and mechanics. So far no complete axiomatiza-
tion of physics has been found.
• Problem 9: Proof of the most general law of reciprocity in algebraic

number theory. For any field of numbers, the law of reciprocity (for
more references see http://www.mathematik.uni-bielefeld.de/∼kersten/-
hilbert/prob9.html) is to be proved for the residues of the lth power,
when l denotes a prime, and further when l is a power of 2 or a power
of an odd prime. This problem is still unsolved.
• Problem 10: Decidability of solvability of diophantine equations. This

question asks if, ‘given a diophantine equation with any number of un-
known quantities and with rational integral numerical coefficients, to
devise a process according to which it can be determined by a finite
number of operations whether the equation is solvable in rational inte-
gers’. In modern terminology the problem asks to devise an algorithm
that tests whether a polynomial has an integral root. A root of a poly-
nomial is an assignment of values to its variables so that the value of the
polynomial is 0. A root is an integral root if all variables are assigned
integer values. Some polynomials have an integral root (for example
6x3 yz 2 + 3xy 2 − x3 − 10 has an integral root at x = 5, y = 3 and z = 0)
and some do not.
Hilbert did not use the term algorithm but rather ‘a process according
to which it can be determined by a finite number of operations’. In
order to solve this problem this notion had to be made more precise
(this was done by Turing, see section 9.1). Also, Hilbert asked that an
algorithm be devised . Thus he apparently assumed such an algorithm
exists, but now we know that this problem is algorithmically unsolv-
able. In 1970, the young Russian Yuri Matijasevic, building on the
work of Martin Davis, Hilary Potnam and Julia Robinson, showed that
no algorithm exists for testing whether a polynomial has integral roots.
• Problem 23: Further development of the methods of the calculus of

variations. Of the 23 problems Hilbert posed, this one is the least defi-
nite, since it involves the general question of extending the calculus of
variations, which basically is the theory of the variation of functions.
With some examples that we will not treat here, Hilbert gave a jus-
tification of the necessity for an extension of the differential and in-
tegral calculus (for more references see http://www.mathematik.uni-
bielefeld.de/∼kersten/hilbert/prob23.html).
At the end of his article, Hilbert says that he does not believe mathema-
tics will, like other sciences, split into separate branches whose connection
becomes ever more loose, but that the organic unity of mathematics is in-
herent in the nature of this science, for mathematics is the foundation of all
exact knowledge of natural phenomena. For a more detailed assessment of
Hilbert’s view, see [49, section 12.4] and [31, section 4.7].
Chapter 7
Types
7.1 Russell and Whitehead’s Principia Ma-

thematica
Logic has become more mathematical and mathematics has be-
come more logical. The consequence is that it has now become
wholly impossible to draw a line between the two; in fact, the two
are one. They differ as boy and man; logic is the youth of ma-
thematics and mathematics is the manhood of logic.
- B. Russell in [79, page 194]
In section 4.1 we saw that with the postulates he presented, Peano stated
and organized the fundamental laws of number theory, the core of mathema-
tics. If statements satisfying these conditions could be derived in this logic,
it would show that (at least part of) mathematics was founded in pure logic.
As we have seen in section 4.2, Frege was adherent to the goal of logicism that
all of mathematics could be derived from logic alone. But unfortunately the
language that he created was inconsistent, as we have learned from Russell’s
paradox in section 5.1. In his 1908 paper, ‘Mathematical Logic as Based on
the Theory of Types’, Russell laid out a theory to eliminate the paradoxes.
With Principia Mathematica, Bertrand Russell and his teacher, the mathe-
matician Alfred Whitehead, presented this theory to prevent the paradoxes
while at the same time allowing many of the operations Frege considered de-
sirable. The theory of types basically says that all sets and other entities have
113
114 CHAPTER 7. TYPES
a logical ‘type’, these types can be ordered and sets are always constructed
from specified members with lower types. We will look at the theory of types
in more detail in section 7.2.
Principia Mathematica consisted of three volumes (sometimes also called ‘the
Principia’) and was named after the ‘Philosophiae naturalis principia mathe-
matica’ of the English physician Isaac Newton. But unlike Newton’s book it
dealt not with the application of mathematical techniques to physics, but to
logic and mathematics itself. With their mathematical treatment of the prin-
ciples of the mathematicians, Russell and Whitehead intended to summarize
the recent work in logic as well as to give a revolutionary and systematical
development of mathematical logic and derive basic mathematical principles
from the principles of logic alone.
Their collaboration began in 1903 when Whitehead and Russell were both
in the initial stages of preparing second volumes to earlier books on related
topics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s
1903 ‘The Principles of Mathematics’. Their work overlapped considerably
and they began collaborating on what would become ‘Principia Mathema-
tica’. The approach of Russell and Whitehead was essentially that of Frege,
to define mathematical entities (like numbers) in pure logic and then derive
their fundamental properties. Indeed, their definition of natural numbers was
basically the same as the one of Frege, but unlike him, they opted to avoid
the philosophical aspects and justifications. Although ‘Principia’ was largely
successful there still was critique on the axioms of infinity and the axiom of
reducibility, they were considered to be too ad hoc solutions to be justified
philosophically. In 1919 Russell published about the philosophy behind his
work in an ‘Introduction to Mathematical Philosophies’ which was accessible
to a broad audience and therefore has been the main source through which
Russell’s logicist view of mathematics has become known.
I quote the following assessment about Principia Mathematica from [91]:

“In addition to its notation (much of it borrowed from Peano), its mas-
terful development of logical systems for propositional and predicate logic,
and its overcoming of difficulties that had beset earlier logical theories and
logistic conceptions, the Principia offered discussions of functions, definite
descriptions, truth, and logical laws that had a deep influence on discus-
sions in analytical philosophy and logic throughout the 20th century. What
is perhaps missing is any hesitation or perplexity about the limits of logic:
whether this logic is, for example, provably consistent, complete, or decida-
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115
ble, or whether there are concepts expressible in natural languages but not
in this logical notation. This is somewhat odd, given the well-known list of
problems posed by Hilbert in 1900 that came to animate 20th-century logic,
especially German logic. The Principia is a work of confidence and mastery
and not of open problems and possible difficulties and shortcomings; it is a
work closer to the naive progressive elements of the Jahrhundertwende than
to the agonizing fin de siecle.”. We would like to add that with the very for-
mal and accurate build-up of mathematics, Russell and Whitehead not only
managed to avoid the paradoxes but also created one of the most impressive
and complicated works of all times and that is, next to Aristotle’s Organon,
considered to be the most influential book on logic that was ever written.
In the next section we will further investigate Russell’s theory of types.

The English mathematician Frank Plumpton Ramsey (1903-1930) offered
criticism to the theory of types that was accommodated in later editions of
Principia Mathematica. The result of this is the ‘deramified theory of types’
that will be treated in subsequent sections, together with a later simplification
to this theory by the mathematicians Hilbert and Wilhelm Ackermann (1896-
1962) from Germany.
The mathematician Alonzo Church also published articles on type systems,
but did not develop his typed version of lambda calculus before the 1940’s,
and his typed lambda calculus thereby falls outside the scope of this article
(1870-1940). We will only summarize his work in this paragraph. The main
difference between the type structure of Russell and that of Church is that
the former is set-based with linear ordering of types and the latter is function
based with a non-linear order of types. The type theory that emerged from
Church’s lambda calculus (see section 9.2) was extended with simple types
in 1940 to prevent paradoxes, similar to the extension of logical set theory
with simple types by Russell in 1910 to avoid the paradoxes. Church also
proposed another logical set theory in 1974.
[..] in the simple theory of types it is well known that the indi-
viduals may be dispensed with if classes and relations of all types
are retained; or one may abandon also classes and relations of the
lowest type, retaining only those of higher type. In fact any finite
number of levels at the bottom of the hierarchy of types may be
deleted. But this is no reduction in the variety of entities, because
the truncated theory of types, by appropriate deletions of entities
in each type, can be made isomorphic to the original hierarchy -

and indeed the continued adequacy of the truncated hierarchy to
the original purposes depends on this isomorphism.
- A. Church in ‘The need for abstract entities’.
Organization of Principia Mathematica
The nearly 2,000 pages Principia Mathematica starts with a short preface
that explains what it wants to demonstrate, namely that pure mathematics
can be based on logic alone and requires no other primitive notions. Russell
classifies statements that involve logical constants only (such as the laws of
reciprocity, see page 18 of Principia Mathematica) as pure mathematics, and
other mathematical assertions that also refer to non-logical contents (such as
the statement that (perceptual) space is three-dimensional) as part of applied
mathematics. The belief was then expressed that pure mathematics was suf-
ficient to include all traditional mathematics. Then, after an introduction,
the first volume introduces a symbolic logic that is based on a small set
of axioms, and then lays out the propositional and predicate calculi. Built
upon these, Whitehead and Russell define types, sets, relations and their
properties, and basic operations on sets. The second volume continues with
a purely logical theory of cardinal and ordinal arithmetic. This allowed them
to introduce basic arithmetic, including addition, multiplication and expo-
nentiation of both finite cardinals and of relations.
The volume ends with a general theory of simply ordered sets (series) which
is followed by a logical base of fundamental mathematical analysis, including
subjects as convergent sequences, continuity, limits and derivatives.
The third volume was meant to prepare the ground for the fourth and con-
cluding volume on geometry (which was never completed), and contained a
theory of numbers that was called ‘measurement’. It starts with a theory of
well-ordered sets, finite, infinite and continuous series, the negative integers,
ratios and the real numbers, and finally vectors, coordinates and basic geo-
metric notions such as angles.
More details about the organization of Principia Mathematica and a critical
assessment of its work can be found in [31, chapter 7, and specifically section
7.8].
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117
The symbolic logic and notation of Principia Mathematica
Russell and Whitehead opted for a more modern notation of Peano in-
stead of Frege’s Begriffsschrift. Unlike Frege, Russell and Whitehead treated
functions as first-class citizens. A good introduction to the logical calculus
and the specific notation that was used in Principia Mathematica can be
found in [49, section 3.2 and 3.3] and [31, sections 7.2, 7.3, 7.7 and 7.8].
Russell’s theory of types
Russell’s 1908 book included a categorization of most of the important

contradictions of that time, and an analysis of their common characteristics.
To prevent the paradoxes he catalogued, Russell formulated the vicious circle
principle (see page 85) and implemented it using types in Principia Mathe-
matica (see for details [31, section 7.9] and [49, section 3.2 and 3.3]).
What is a type?
A type is the range of significance of a propositional function, that is, the
collection of arguments for which the said function is significant and has val-
ues.
The type of a variable in a proposition is fixed by all the values the func-
tion is concerned with, i.e. by the totality over which the variable ranges.
This division of objects into types (the type of an object can be seen as a
property of that object) is necessary to conform to the vicious circle principle,
i.e. to make sure that ‘whatever contains an apparent variable must not be
a possible variable of that variable’. This can be established by making sure
that ‘an apparent variable’ is of a different and higher type than the possible
values of that type. This linear order of types prevents vicious circles, since
the variables contained in an object determine the type of that object.
Russell then defined an individual as being not a proposition but a con-

stant, destitute of complexity. We can now categorize propositions by their
types. First order propositions are elementary propositions that only con-
tain individuals, second order propositions are propositions with first-order
propositions as variables and possibly propositions of lower than first order
types. This can be continued, such that the n + 1th order propositions con-
tain propositions of order n and possibly others of order smaller than n.
We now also restrict relations like ∈ so that x ∈ y is only significant when

y is of a type one level higher than x, and we confine quantifiers always
to a single level. As can be proved however, this way of restricting propo-
sitions prevents the paradoxes but can in some cases be needlessly restrictive.
For more information about types in Principia Mathematica, see [31,

section 7.9] and [49, section 3.3]. For a formalization (in modern notation) of
Russell’s Ramified Theory of Types (RTT), we refer to [86, chapter 3]. On
its turn, this reference is again partly based on [52], [53], [54] and [43], all of
which in a certain context discuss RTT.
A detailed introduction to the (symbolic) logic and notation of Principia
Mathematica, as well as a formal introduction to RTT, STT and NF and
MP (see section 7.3), is to be included in a later version of this report.
7.2. RAMSEY, HILBERT AND ACKERMANN 119
7.2 Ramsey, Hilbert and Ackermann

Suppose a contradiction were to be found in the axioms of set
theory. Do you seriously believe that a bridge would fall down?
- F.P. Ramsey, quoted in [58]
Ramsey published his first major work ‘The Foundations of Mathematics’

(see [69, page 105-142]) in 1925. In this publication he attempted to improve
Principia Mathematica in two ways. First he proposed dropping the axiom
of reducibility which, he writes, is “[...] certainly not self-evident and there
is no reason to suppose it true; and if it were true, this would be a happy
accident and not a logical necessity, for it is not a tautology.”. His second
simplification is to suggest simplifying Russell’s theory of types by regarding
certain semantic paradoxes as linguistic. He accepted Russell’s solution to
remove the logical paradoxes of set theory arising from, for example, ‘the
set of all sets which are not members of themselves’. However, the seman-
tic paradoxes such as ‘this is a lie’ are, Ramsey claims, quite different and
depend on the meaning of the word ‘lie’. These he removed with his reinter-
pretation of the axiom of reducibility.
After his suggestions, Russell’s theory became known as the ramified theory
of types (RTT), and Ramsey’s modification of the theory as the deramified
theory of types.
For more detailed information about the history of deramification, we refer
to [86, chapter 4].
Hilbert, together with Ackermann (see [2]), simplified Russell’s theory of

types by removing the orders into what has become known as the ‘simple
theory of types’ (STT). We quote from page 115 of [49]: “[In the simple
theory of types,] every individual or individual variable is said to be of type
i; and if a predicate or predicate variable ϕ(x1 , . . . , xn ) has arguments x1 ,
. . . , xn , of types τ1 , . . . , τ2 respectively, then ϕ(x1 , . . . , xn ) is said to be of
type (τ1 , . . . , τ2 ). Thus, for example, any predicate with two individual ar-
guments is of type (i, i), while a predicate with a single argument that is
itself a predicate with two individual arguments is of type (i, i, (i, i)). Having
introduced the hierarchy of types in this way, we shall now require bound
variables to be of some definite type. Every quantifier will then range over
the totality of all entities of the same type as the bound variable. When
this is done, we have a very comprehensive logical calculus which is secure

against vicious circularity”.
A further discussion and formalization (in the form of Church’s simply

typed lambda calculus λ → c) of the simple theory of types can be found in
[86].
7.3. QUINE 121
7.3 Quine
Just as the introduction of the irrational numbers . . . is a conve-
nient myth [which] simplifies the laws of arithmetic . . . so physical
objects are postulated entities which round out and simplify our
account of the flux of existence . . . The conceptional scheme of
physical objects is [likewise] a convenient myth, simpler than the
literal truth and yet containing that literal truth as a scattered part
- Quine, quoted in [50]
Willard Van Orman Quine (1908-2000) was an American mathematician

who became interested in the work of Russell. An alternative to Russell’s sys-
tem is one that allows a single universe of all types (or all sets). In Russell’s
theory such an object is too big but according to others, including Quine,
having a set of all sets or a type of all types is legitimate as long as we do not
permit forming all subsets. If there is some restriction on which subsets can
be formed, for example by requiring a stratified predicate to define the sub-
set, then no contradiction will result. Quine proposed in [94, pages 80-101]
a system called New Foundations, NF, based on this idea. To restrict the
way subsets are formed, Quine further restricted the comprehension axiom to:
NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)), where x is not

free in ϕ(y) and ϕ(y) is stratified
In [86, footnote 4], we find two definitions of stratification.
Definition of heterogeneous stratification: A well-formed formula ϕ

is heterogeneously stratified := there is a function f from the variables and
constants of ϕ to the natural numbers such that for each atomic well-formed
formula F (x1 , . . . , xn ) of ϕ, f (F ) = 1 + (max : 1 ≤ i ≤ n : f (xi ))
Definition of homogeneous stratification: A well-formed formula ϕ is

homogeneously stratified := ϕ is heterogeneously stratified and for the corre-
sponding function f we also have that f (xi ) = f (xj ) for 0 ≤ i, j ≤ n
With the NFC axiom the paradox is obviously prevented, since the sen-
tence ϕ ≡ x ∈
/ x is not stratified.
We quote from [86, page 3]: “NF is weak for mathematical induction and
the axiom of choice is not compatible with NF. We cannot prove Peano’s
axiom[s] in it, unless we assume the existence of a class with m + 1 ele-
ments. Also, NF is said to lack motivation because its axiom of compre-
hension is justified only on technical grounds and one’s mental image of set
theory does not lead to such an axiom. To overcome some of the difficulties,
Quine adopted similar measures to NBG (Neumann-Bernay-Gödel, see sec-
tion 8.5) set theory[, and developed another non-iterative set theory called
ML (Mathematical Logic), first presented in [70]]. Like NBG, ML contains
a bifurcation of classes into elements and non-elements. Sets can enjoy the
property of being full objects whereas classes cannot. ML was obtained from
NF by replacing (NFC) by two axioms, one for class existence and one for
elementhood. The rule of class existence provides [. . . ] the existence of the
classes of all elements satisfying any condition ϕ, stratified or not. The rule
of elementhood is such as to provide the elementhood of just those classes
which exist for NF. Therefore, the two axioms of comprehension for ML [are]:
Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is stratified
with set variables only in which y does not occur free.
Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x)
is any formula in which y does not occur free.
ML was liked both for the manipulative convenience we regain in it and
the symmetrical universe it furnishes. It was however proved subject to the
Burali-Forti paradox”.
For more information, we refer to [70], [71], [72] and the website
http://diamond.boisestate.edu/∼holmes/holmes/nf.html.
Chapter 8
Gödel
The development of mathematics towards greater precision has

led, as is well known, to the formalization of large tracts of it,
so that one can prove any theorem using nothing but a few me-
chanical rules. [. . .] It will be shown below that this is not the
case, that on the contrary there are in the two systems mentioned
[viz. Principia Mathematica and ZF] relatively simple problems
in the theory of integers that cannot be decided on the basis of the
axioms.
- K. Gödel, in the opening of the paper introducing the incom-

pleteness theorem (1931)
8.1 Informally: Gödel’s incompleteness theorems

No system of Hilbert’s type in which the integers (or Peano’s arithmetic, see
section 4.1) can be defined can be both consistent and complete. At the
time this seemed unreal, but in 1931 Kurt Gödel (born in 1906 in Brnn,
Austria-Hungary, what is now Brno, Czech Republic) presented mathema-
ticians with the astounding and melancholy conclusion that the axiomatic
method has certain limitations, which rule out the possibility that even the
ordinary arithmetic (as by Peano) can ever be fully axiomatized. As a corol-
lary of this theorem, he proved that it is impossible to establish the internal
logic consistency of a very large class of deductive systems. It provoked a
reappraisal of philosophies of mathematics.
123
124 CHAPTER 8. GÖDEL
Gödel’s famous incompleteness theorem and the corresponding corollary

are also called the first and the second incompleteness theorem. Gödel was
able to show that, if an axiomatic system of formalized arithmetic is wide
enough, then
1. The system is necessarily incomplete, in the sense that there exists a
formula ϕ of the system such that neither ϕ nor its negation is derivable
(see also section 8.2 for the definition of incompleteness), and
2. If the system is consistent, then no proof of its consistency is possible

which can be formalized within it (see also section 8.2 for the definition
of consistency).
We first indicate (in 8 steps, following the lines of the original proof of
Gödel) the main lines of both theorems in this section, and provide a more
rigorous and exact proof of the theorems in section 8.2 and further sections.
1 The (syntax of) formulas of an axiomatic system are precisely defined

and built up from a finite alphabet of symbols. Proofs are noth-
ing but a finite series of formulas and can be replaced by numbers.
With such a representation, the Gödel numbering, Gödel gave a well-
ordering of all well-formed formulae of an axiomatic system S (to be
precise, of ω-complete systems, see section 8.2 for more details). Gödel
then showed how to represent metamathematical concepts as ‘formula’,
‘proof-schema’ and ‘provable formula’ by a series of natural numbers.
We define gn(ϕ) to be the Gödel number corresponding to well-formed
formula ϕ of S.
2 We consider a formula prov(ϕ) of S, stating that ϕ is a provable for-

mula. Precisely, we define prov(ϕ) := ‘ϕ is a provable formula’. A class
sign is a formula with just one free variable. We suppose that the class
signs are ordered by a function R with domain N, such that R(n) is
defined as the nth class sign. By [R(n); q] we denote the formula which
is denoted by replacing the free variable in R(n) by q.
3 We now define a set K of natural Gödel numbers by n ∈ K ↔

¬prov([R(n); n]). Since the symbols that are used in this formula are
all definable in S, there also is a formula with one free variable (i.e. a
class sign) that denotes n ∈ K, for some natural number n. We call
8.1. INFORMALLY: GÖDEL’S INCOMPLETENESS THEOREMS 125
this class sign C. So there is a natural number q such that C = R(q).

We now show that the proposition G ≡ [R(q); q], is unprovable in S.
Since1 this formula says that q ∈ K, that is ¬prov([R(q); q]), we can
say that G is a property that asserts of itself that it is not provable.
4 We show that G is provable ↔ ¬G is provable, and hence is undecid-

able:
• Suppose G is provable, this means [R(q); q] is provable, (by replac-

ing the variable in the class sign by q) that is q ∈ K, i.e. ¬prov([R(q); q]),
and this says ¬prov(G) : G is not provable.
• Suppose G is not provable, this means its negation ¬[R(q); q] is
provable, (by replacing the variable in the class sign C by q)
that is q ∈
/ K, i.e. (¬¬prov([R(q); q]), and this is equivalent with
prov([R(q); q]) or prov(G) : G is provable.
A proof of G leads to a proof of ¬G and vice versa, thus the system S

is inconsistent. So if we assume that S is consistent, then both G and
¬G must not be provable: G is undecidable in S.
5 By a metamathematical consideration we know however that G is true.

Because from the remark that G asserts its own unprovability, it follows
at once that G is true, since G is unprovable (because undecidable).
So there is a true statement in S (namely G) that is not provable: the
system S is incomplete!
6 If we add G as an axiom, we can again apply the argument given

in the previous five steps in the same way. Basically we then create
another formula G , since in step 3 a proposition is defined that states
‘this formula is not provable’, or in other words ‘this formula does not
follow from the axioms’. That means, the proposition depends on the
set of axioms. Therefore, as I. Grattan-Guinness cleverly calls it in [31,
page 510], the system S is ‘essentially incompleteable’.
7 Gödel then showed that ‘if arithmetic is consistent, it is incomplete’.

We want to prove this conditional statement as a whole. We define
the condition of the statement by A: ‘arithmetic is consistent’. We
1
By replacing in the class sign C, which expresses that n ∈ K for some natural number
n, the free variable by q.
already have seen in section 6.1 that this means that there is at least
one formula ϕ of arithmetic that is not true. So we can express A ≡
(∃y :: (∀x :: ¬prov (x is a proof of y))). A system is incomplete if
there is a true statement that is not provable. Thus we can represent
the conclusion of the conditional statement by G.
8 We can now formally prove A → G (see section 8.2 for the proof). This
means that if A is provable, we know (by modus ponens or the role of
detachment) that G is provable. But we already saw that (unless S
is inconsistent), G is not provable; thus if S is consistent, A is not
provable! That means if arithmetic is consistent its consistency cannot
be established by metamathematical reasoning within the formalism
of arithmetic (this is Gödel’s theorem 11, see [93, page 614]). Or, as
expressed in [31, page 510], ‘any set S of consistent formulae of P M
cannot include the formula F asserting its consistency’.
8.2. FORMALLY: GÖDEL’S INCOMPLETENESS THEOREMS 127
8.2 Formally: Gödel’s Incompleteness Theorems

The first incompleteness theorem says that Principia Mathematica or any
other system in which arithmetic can be developed, is essentially incomplete,
that is in any consistent set of arithmetical axioms there are statements that
are true but cannot be derived from the set.
The second theorem says that it is impossible to give a metamathemat-

ical proof of the consistency of a system comprehensive enough to contain
the whole of arithmetic - unless the proof itself employs rules of inference
in certain essential respects different from the derivation rules identifying
theorems within the systems.
In the following two paragraphs, we will first give an abstract version of

Gödel’s first and second incompleteness theorem, investigate the set of lan-
guages that the theorem applies to, and then in the third paragraph fill in
the details by giving a specific Gödel numbering for arithmetic. Then in the
next sections we will apply the theorem to the system of Peano Arithmetic
and that of Principia Mathematica, and discuss the consequences of the in-
completeness theorem.
8.2.1 On formally undecidable propositions

We assume there is an STGA language L and investigate the conditions for
a system L for which Gödel showed that there is a true sentence that is not
provable in L (i.e. (∃t : t ∈ T : t ∈
/ P)). We define the following concepts:
A predicate H expresses a set of numbers A := (∀n :: H(n) ∈ T ↔ n ∈ A)
A is expressible in L if A is expressed by some predicate of L. Note that
expressibility in L only concerns with T and not with P and R.
Theorem: Not every set of numbers is expressible.

Proof: (from [84]) Since L is built up of a finite number of symbols and
derivation rules, there are only denumerably many expressions or predicates
of L. But (by Cantor’s theorem, see page 69) there are non-denumerably
many sets of natural numbers. Therefore, not every set of numbers is ex-
pressible in L.
Let gn be a function that assigns to each expression a unique natural

number (just as in step 1 in section 8.1, i.e. gn is a bijection between E and
N). For any E ∈ E, we also call gn(E) the Gödel number of E. We will
give a specific numbering in section 8.2.3. For this abstract treatment the
only assumption2 we make is that every number is the Gödel number of some
expression.
We define En to be the inverse of gn, i.e. gn(En ) = n. The diagonali-

zation of En for En ⊆ H, is defined by En (n). We define d(n) to be the
Gödel number of the diagonalization of En , that is: d(n) := gn (En (n)), and
call d the diagonal function of the system. For each set of natural numbers
A, we define A∗ to be the set of all numbers n such that d(n) ∈ A, i.e. we
have n ∈ A∗ ↔ d(n) ∈ A. For any set of natural numbers A, we define
its complement A to be the set of all natural numbers not in A. The com-
∗ ) is to be read as (A)
plement operation ∼ binds stronger than the ∗, i.e. (A ∗.
Abstract form of Gödel’s first theorem: Let P be a set of Gödel num-

∗ is expressible in L and L is
bers of all the provable sentences. If the set P
correct, then there is a true sentence of L not provable in L.
Proof: (based on [84]) Suppose L is correct and P∗ is expressible in L by
a predicate H with Gödel number h. Let G be the diagonalization of H
(i.e. the sentence H(h)). We show that G is true but not provable in L. H
expresses P∗ in L, i.e. H(n) is true ↔ n ∈ P∗ for all n ∈ N. In particular,
H(h) is true ↔ h ∈ P∗ . We have that h ∈ P∗ ↔ d(h) ∈ P ↔ d(h) ∈ / P.
But since h is the Gödel number of H and by the definition of d, d(h) is
the Gödel number of H(h) and so d(h) ∈ P ↔ H(h) is provable in L and
d(h) ∈ / P ↔ H(h) is not provable in L. Now we have: H(h) is true ↔ H(h)
is not provable in L. This means that H(h) is either true and not provable
in L or false but provable in L. The latter alternative violates the hypothe-
sis that L is correct. Hence it must be that H(h) is true but not provable in L.
Note that in this proof we have not defined the set T by a model but
determined the truth of G by a metamathematical argument just as we have
seen in step 5 of section 8.1, that is nevertheless commonly accepted by all
mathematicians. Note also that the proposition G corresponds to the propo-
2
This assumption is for technical reasons that make the proof more simple; Gödel’s
original numbering did not have this restriction.
sition G of point 3 of section 8.1, since H(h) is a proposition that expresses

of itself that it is not provable.
Theorem: If L is correct and if the set P∗ is expressible in L, then L is

incomplete.
Proof: A system L that is correct and for which the set P∗ is expressible in
L contains a sentence G that is true but not provable or refutable (By the
previous theorem and the assumption of correctness). Hence G is true, but
undecidable in L, and hence also incomplete.
That is where the name incompleteness theorem comes from. By this

theorem, it follows immediately that if a system is consistent, and the set
P∗ is expressible in that system (which we will later see is true for a system
of basic arithmetic) then it is incomplete. Note that this is the statement
A → G of point 8 in section 8.1.
When we study a particular language L, such as a system containing Peano’s
arithmetic or the system of Principia Mathematica, we have to verify the
assumption that P∗ is expressible in L. We can do this by separately verifying
the following conditions.
G1 : For any set A expressible in L, the set A∗ is expressible in L.
is expressible in L.
G2 : For any set A expressible in L, the set A
G3 : The set P is expressible in L.
Theorem: G1 ∧ G2 ∧ G3 → P∗ is expressible in L.
Proof: G1 and G2 imply that for any expressible set A, A ∗ is expressible in
L. In particular we then have that if P is expressible in L (i.e G3 holds), P∗
is expressible in L.
Before we prove a general form of Gödel’s second incompleteness theo-

rem, we introduce some more definitions.
A sentence En is a Gödel sentence for a set A of natural numbers if either

En is true and its Gödel number lies in A, or En is false and its Gödel number
lies outside A, i.e. En is a Gödel sentence for A if and only if En ∈ T ↔ n ∈ A.
Diagonal Lemma: For any set A, if A∗ is expressible in L, then there is a

Gödel sentence for A.
Proof: Suppose H is a predicate that expresses A∗ in L; let h be its Gödel

number. Then d(h) is the Gödel number of H(h). For any number n, H(n)
is true ↔ n ∈ A∗ , therefore, H(h) is true ↔ d(h) ∈ A, and since d(h) is the
Gödel number of H(h), then H(h) is a Gödel sentence for A.
Lemma: If L satisfies G1 , then for any set A expressible in L, there is a

Gödel sentence for A.
Proof: L satisfies G1 , thus for any expressible set A, A∗ is expressible in
L. Now we can apply the previous lemma to conclude that there is a Gödel
sentence for A.
With the diagonal lemma we can also prove the first theorem as follows:
Since P∗ is expressible in L, by the diagonal lemma, there is a Gödel sentence
G for P. A Gödel sentence for P is a sentence which is (by the definition
of a Gödel sentence) true if and only if it is not provable in L. So for any
correct system L, a Gödel sentence for P is a sentence which is true but not
provable in L.
8.2.2 The impossibility of an ‘internal’ proof of consis-

tency
With the diagonal lemma we can also prove a general form of Gödel’s second
theorem, that was first formulated in this form by the Polish mathematician
Alfred Tarski.
A general form of Gödel’s second theorem (by Tarski)
1. The set T∗ is not expressible in L
2. If condition G1 holds, then T is not expressible in L
3. If conditions G1 and G2 both hold, then the set T is not expressible in

L (i.e. for systems for which G1 and G2 hold, truth within the system
is not definable within the system.)
Proof: To begin with, there cannot possibly be a Gödel sentence for the set
T because such a sentence would be true if and only if its Gödel number was
not the Gödel number of a true sentence, and this is absurd.
1. If T∗ were expressible in L, then by the diagonal lemma, there would be

a Gödel sentence for the set T, which we have just shown is impossible.
Therefore, T∗ is not expressible in L.
2. Suppose condition G1 holds. Then if T were expressible in L, the set

T∗ would be expressible in L, violating (1).
3. If G2 also holds, then if T were expressible in L, then T would also be

expressible in L, violating (2).
Now we have seen both theorems in a general form, we will consider particular
mathematical languages, starting with first order arithmetic, which we can
build on in section 8.3 to prove the incompleteness of systems based on
Peano’s arithmetic and other systems.
8.2.3 Gödel numbering and a concrete proof of G1 , G2

and G3
This section will be completed in a later version of this document. For the
moment we refer to Gödel’s original work that can be found in [93].
8.3 Gödel’s theorem and Peano Arithmetic

The classification of the various modes of syllogisms, when they
are exact, has little importance in mathematics. In the mathema-
tical sciences are found numerous forms of reasoning irreducible
to syllogisms.
- G. Peano in [68, page 379]
There are various different incompleteness proofs of Peano Arithmetic

(with and without exponentiation). We mention three of them. The sim-
plest uses a truth set defined by Tarski and shows that every axiomatizable
subsystem of N (the complete theory of arithmetic) is incomplete. This
proof of Gödel’s first theorem however cannot be formalized in arithmetic
(since the truth set is not expressible in arithmetic), and was based on the
underlying assumption that Peano Arithmetic is correct, implying that every
sentence provable in Peano Arithmetic is a true sentence. Gödel’s original
incompleteness proof involves the much weaker assumption of ω-consistency.
Definition of simple consistency: An axiomatic system A is

simply consistent := no sentence is both provable and refutable in A
Definition of ω-inconsistent: An axiomatic system A is ω-inconsistent

:= there is a predicate F (w) (in one free variable w) such that the sentence
(∃w :: F (w)) is provable but all the sentences F (0), F (1), . . . are refutable
Definition of ω-incomplete: An axiomatic system A is ω-incomplete := A

is a simply consistent axiomatic system in which all Σ0 -sentences are provable
Gödel’s original proof was based on the assumption of ω-consistency and

shows that every axiomatizable ω-consistent system in which all true Σ0 -
sentences are provable is incomplete. This proof is of course formalizable in
Peano Arithmetic (and this is necessary for Gödel’s second theorem) and also
shows that any axiomatic system A that is simply consistent and in which
all Σ0 -sentences are provable, is ω-incomplete.
The third proof (1936) is due to Rosser and uses the even weaker assumption
of simple consistency. It is based on an axiomatic system by the American
mathematician Raphael Robinson (1912-1995), that we refer to as R. It
8.3. GÖDEL’S THEOREM AND PEANO ARITHMETIC 133
shows that every axiomatizable simply consistent extension of R is incom-

plete, but thereto uses a more elaborate sentence than the Gödel sentence
‘G is undecidable’.
We intend to include the three proofs in a later version of this document.

They can be found in [84] but in a particular presentation that does not use
the concept of a model for axiomatic systems, and that sometimes attaches
different meanings to established definitions, nevertheless it contains in our
opinion one of the best discussions of Gödel’s incompleteness theorems.
In a later version of this document we will also show how, given the proof
of incompleteness of Peano Arithmetic, Gödel’s theorems apply to Principia
Mathematica.
We quote K. Gödel on the first page of [27]:
The most comprehensive formal systems that have been set up

hitherto are the system of Principia Mathematica on the one hand
and the Zermelo-Fraenkel axiom system of set theory (further de-
veloped by J. von Neumann) on the other. These two systems are
so comprehensive that in them all methods of proof today used
in mathematics are formalized, that is, reduced to a few axioms
and rules of inference. One might therefore conjecture that these
axioms and rules of inference are sufficient to decide any ma-
thematical question that can at all be formally expressed in these
systems. It will be shown that this is not the case, that on the
contrary there are in the two systems mentioned relatively simple
problems in the theory of integers that cannot be decided on the
basis of the axioms”.
8.4 Consequences
I had a lot of conversations with him [Gödel] and a lot of dis-
agreements. Like most others, I was hard to convince about the
incompleteness theorem. There was at the time a tendency, which
I shared, to think that it was special to a certain type of formali-
zation of logic and that a radical reformalization might have the
effect that the Gödel argument did not apply. I persisted in that
longer than I should have, and he was always trying to convince
me otherwise.
- A. Church in an interview at Princeton University (1985)
In a later version of this document we will discuss the implications of

Gödel’s theorem and show the reactions that followed the publication of his
paper [27] in 1931.
8.5. NEUMANN-BERNAYS-GÖDEL AXIOMS 135
8.5 Neumann-Bernays-Gödel axioms

There is an infinite set A that is not too big.
There’s no sense in being precise when you don’t even know what
you’re talking about.
- John von Neumann (sources unknown)
Let us recapture the situation of the axiomatic theory of sets before we

introduce the Neumann-Bernays-Gödel theory.
When Cantor introduced his set theory, he gave the informal definition
(see page 16) of a set being ‘any comprehension into a whole M of definite
and separate objects m of our intuition or thought’. After Hilbert proposed
his proof theory, set theory was given a more rigorous basis, and axiomatic
theories for Cantor’s sets were proposed. Cantor’s definition was replaced by
the principle of comprehension (see page 16), which was adopted by Frege
and Russell. Based on this principle a first formal theory of sets, called ‘ideal
calculus’ was developed (not treated in detail here, see for example [36]). The
antinomies of Burali-Forti and Russell however showed that this theory was
inconsistent, and one way to restore consistency was to incorporate in the
system a theory of types, as was done by Russell. At the same time, intu-
itionists tried to do mathematics without Cantor’s set theory at all. Others
tried to overcome the inconsistencies by making Cantor’s set theory more
rigidly axiomatic, and the most successful axiomatization of set theory was
presented by Zermelo in 1908.
The problem for him was to solve the problem of axiomatization in such
a way that it excludes all contradictions but still is sufficiently wide for all
that is valuable in this theory to be preserved. As we have seen in section
5.3, Zermelo postulated a domain of abstract objects (sets) and elements of
this domain, defined the primitive notions of ‘equality’ and ‘is element of’
relation, and introduced 7 axioms. The comprehension axiom was replaced
by the weaker separation axiom, that only allows new sets to be created
from existing sets and with definite predicates. Before we will describe why
the Hungarian mathematician von Neumann opposed this solution and came
with his own solution to the paradoxes, we will look at this separation axiom
in more detail. Zermelo defined the separation axiom as follows:
Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y. For
every set z there exists a set y whose elements are exactly those of z having
the property ϕ.
The concept of definiteness in this axiom was defined by Zermelo as fol-

lows: “A question or assertion ϕ, the validity or invalidity of which is decided
without arbitrariness by the basic laws of logic, is said to be ‘definite’ ”.
We have already seen on page 93 that this axiom excludes the paradoxes of
Russell and Burali-Forti, and as Kneebone remarks3 in [49, page 263] also
the semantic paradoxes.
In [83], the Norwegian mathematician Skolem pointed out that the defi-
nition of ‘definiteness’ was rather vague and he made precise the formulation
of ‘by the basic laws of logic’. Fraenkel used Skolem’s idea to formulate
the separation axiom in a new way (for details, see [49, page 290, 291]). In
1922 Fraenkel proposed the introduction of another axiom that allows the
existence of larger cardinal numbers than hitherto possible. The foundation
axiom of von Neumann makes occurrence of so-called extraordinary sets im-
possible. A set is extraordinary if there is a sequence of sets V1 , V2 , V3 , . . .
such that V2 ∈ V1 , V3 ∈ V2 , etc. Von Neumann’s subsequent interest in set
theory led to the second major axiomatization of set theory in the 1920s.
His formulation differed considerably from Zermelo and Fraenkel (see sec-
tion 5.3) because the notion of function, rather than that of set, was taken
as primitive. In a series of papers beginning in 1937, however, the Swiss
logician Paul Bernays, a collaborator with the formalist David Hilbert, mod-
ified the von Neumann approach in a way that put it in much closer contact
with Zermelo and Fraenkel. In 1940, the Czech-born Kurt Gödel, known for
his incompleteness proof (see chapter 8), further simplified the theory. This
version is known as the Neumann-Bernays-Gödel (NBG) axioms.
3
We quote: “since a definite property is one that is decidable by the basic relations of
the domain B [of sets, the abstract objects postulated by Zermelo], no such property as
that of being definable in a finite number of words can be used in the definition of a set,
and the semantic paradoxes are thus also excluded”.
Before we give the axioms, it is convenient to adopt the undefined notions

of class and the membership relation (though, as is also true in Zermelo and
Fraenkel, ∈ suffices). In the axioms we distinguish between the use of capital
Latin letters and lowercase Latin letters for the variables. The capital letters
stand for variables that take classes (the totalities corresponding to certain
properties) as values. A class is defined to be a set if it is a member of some
class; those classes that are not sets are called proper classes. The lowercase
letters are used as special restricted variables for sets.
Example: ‘for all x, A(x)’ stands for ‘for all X, if X is a set, then A(X)’;
i.e. the condition holds for all sets. Intuitively, sets are intended to be those
classes that are adequate for mathematics, and proper classes are thought
of as those collections that are ‘so big’ that, if they were permitted to be
sets, contradictions would follow. In the Neumann-Bernays-Gödel axioms,
the classical paradoxes are avoided. This can be proven by showing in each
case that the collection on which the paradox is based is a proper class, i.e. is
not a set.
Theorem: With the Neumann-Bernays-Gödel axioms, the derivation of

Russell’s paradox does not apply.
Proof: We show that R := {x | x is a set ∧ x ∈ / x} is a class, but not

a set. For all y we have that y ∈ R ↔ y is a set ∧ y ∈ / y. We prove by
contradiction that R is not a set.
Suppose R is a set. Suppose R ∈ R. But then we have (take R for y in the
above statement) R ∈ R ↔ R is a set ∧ R ∈ / R: contradiction. So we must
have R ∈/ R. Then by our assumption we have R is a set ∧ R ∈ / R, and
thus R ∈ R: contradiction. Since in both cases (R ∈ R and R ∈ / R) we get
a contradiction, out assumption that R is a set must be wrong.
The Neumann-Bernays-Gödel axioms (NBG):
1 Extensionality axiom (or axiom of determination):

(∀X, Y, z :: (z ∈ X ↔ z ∈ Y ) → X = Y )
Classes are uniquely determined by their members, to be exact: if every
element (that is a set) of a class X is at the same time an element of
Y , and conversely, than X = Y .
2 Axiom of the empty set:

(∃x∀y :: y ∈
/ x)
There is an (improper, see also footnote on page 93) set, the ‘null’ or
‘empty’ set, which contains no elements at all.
3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)), ϕ is a proposi-
tion in which set variables are only introduced by existential and uni-
versal quantifiers. For every set z there exists a set y whose elements
are exactly those of z having the property ϕ.
4 Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
5 Sum-set axiom or Union axiom:
(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))
For every set z there exists a set y whose elements are exactly those
objects occurring in at least one element of z.
6 Power set axiom
(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)
For every set z there is a set y that includes every subset of x.
7 Axiom of infinity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z))
There exists a successor set.
8 Axiom of choice:
(∀x :: (∃f : f is a function : Dom(f ) = x − {∅} ∧ (∀a : a ∈ Dom(f ) :
f (a) ∈ x)))
Every set x has a choice function.
9 Axiom of replacement or axiom of substitution (by Fraenkel):
(∀x∃!y : ϕ is a class : ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a :
ϕ(x, y))))
The image of a set under an operation (functional property) is again a
set.
10 Axiom of restriction: X
= ∅ → (∃y : y ∈ X ∧ X ∩ y = ∅) Every
non-empty class is disjoint from one of its elements.
The axioms 1, 3, 9 and 10 are different from ZF. The third axiom (scheme)
is presented in a form to facilitate a comparison with the third axiom (scheme)
of ZF. In a detailed development of NBG, however, there appears, instead,
a list of seven axioms (not schemes) that for each of certain conditions there
exists a corresponding class of all those sets satisfying the condition. From
this finite set of axioms, each instance of the above scheme, can be obtained
as a theorem. When obtained in this way, the third axiom scheme of NBG
is called the class existence theorem.
In contrast to the ninth axiom scheme of ZF (see section 5.3.2), that of

NBG is not an axiom scheme but an axiom. Thus, with the comments above
about the third axiom in mind, it follows that NBG has only a finite number
of axioms. On the other hand, since the ninth axiom or scheme of ZF provides
an axiom for each formula, ZF has infinitely many axioms. The finiteness of
the axioms for NBG makes the logical study of the system simpler.
The relationship between the theories may be summarized by the state-

ment that ZF is essentially the part of NBG that refers only to sets. We give
the following theorems without proof:
Theorem: Every theorem of ZF is a theorem of NBG
Theorem: Any theorem of NBG that speaks only about sets is a theorem
of ZF
Theorem: ZF is consistent if and only if NBG is consistent
Note that the fact that NBG avoids the classical paradoxes and that
there is no apparent way to derive any one of them in ZF does not settle the
question of the consistency of either theory. All we know from this theorem
is that either both axioms are consistent, or both are inconsistent.
Chapter 9
Church and Turing
9.1 Turing and Turing Machine

We may hope that machines will eventually compete with men in
all purely intellectual fields.
- Alan Turing in [38, page 46]
Alan Mathison Turing (1912-1954) was an English mathematician and

logician who pioneered in the field of computer theory and who contributed
important logical analyses of computer processes. Turing studied in Cam-
bridge, worked there on probability theory and (independently of de Moivre)
discovered the central limit theorem. In 1936 he won the Smith’s Prize. As
we have seen in the previous chapters, many mathematicians had attempted
to eliminate all possible error from mathematics by establishing a formal,
or purely algorithmic, procedure for establishing truth (the so-called for-
malist program). With his incompleteness theorem (see section 8.1), Kurt
Gödel threw up an obstacle to this effort, for he showed that any useful ma-
thematical axiom system is incomplete in the sense that there must exist
propositions whose truth can never be decided (within the system). Turing
was motivated by Gödel’s work to seek an algorithmic method of determining
whether any given propositions were undecidable, with the ultimate goal of
eliminating them from mathematics. Instead, he proved in his seminal paper
‘On Computable Numbers, with an Application to the Entscheidungspro-
blem’ (reprinted in [19]) that there cannot exist any such universal method
of determination. We now regard this decision problem, or Entscheidungs-
141
142 CHAPTER 9. CHURCH AND TURING
problem, in more detail.
Decidability was one of Hilbert’s requirements for an axiomatic system

(see section 6.1). The problem of decidability asks if, given a mathematical
proposition, one could find an algorithm which would decide if the propo-
sition is true or false. When given an algorithm, it is easy to see that it
can prove certain propositions. But it is more difficult to prove there is no
algorithm that can solve certain propositions. Thereto Turing introduced a
hypothetical computing device (later called Turing machine). The Turing
Machine and proof of undecidability are given later in the section.
After this important publication Turing completed his Ph.D. in 1938 on

systems of logic based on ordinals, under direction of Alonzo Church (see
section 9.2). During the war Turing worked on breaking German Enigma
codes, and in 1948 he worked in Manchester on the construction of a new
digital computer. He described a modern computer before technology had
reached the point where the construction was a realistic possibility. His ef-
forts in the construction of early computers and the development of early
programming techniques were of prime importance. He also championed the
theory that computers eventually could be constructed that would be capable
of human thought, and he proposed a simple test, now known as the Tur-
ing test, to assess this capability. Turing’s papers on the subject are widely
acknowledged as the foundation of research in artificial intelligence. In 1952
Turing published the first part of his theoretical study of morphogenesis, the
development of pattern and form in living organisms.
The Turing Machine
Turing introduced his hypothetical computing device in 1936. He origi-

nally conceived the machine as a mathematical tool that could infallibly re-
cognize undecidable propositions - i.e., those mathematical statements that,
within a given formal axiomatic system (that includes at least arithmetic),
cannot be either true or false. Gödel had demonstrated that such proposi-
tions exist in any such system. Turing instead proved there can never exist
any universal algorithmic method for determining whether a proposition is
undecidable. This was left open by Gödel, since the incompleteness theorem
(see section 8.1) only stated that consistency and completeness could not at
the same time be attained; that means there were statements (in consistent
9.1. TURING AND TURING MACHINE 143
systems) about numbers, indubitably true, which could not be proved from
finitely many rules. But the decidability of mathematical statements was
not settled by Gödels theorem because it needs a formal definition of (al-
gorithmic) method in the formulation of the problem (or a definition of the
notion of algorithm in the definition of decidability in section 6.1). Thereto
Turing introduced a machine that was later to be called the Turing machine,
an idealized mathematical model that reduces the logical structure of any
computing device to its essentials. By extrapolating the essential features of
information processing, Turing was instrumented in the development of the
modern digital computer. His model served as a basis for all subsequent digi-
tal computers, which share his basic scheme of an input/output device (tape
and head), memory (tape) and central processing unit (head and transition
function).
Nowadays there are many models of computing devices available in the

theory of computation (complexity). We will not cover restricted models
such as finite automata and pushdown languages (and corresponding notions
such as regular languages and context-free grammars). We now directly in-
troduce the much more powerful model of Turing that we need to invest all
mathematical problems.
The Turing Machine model uses an infinite tape as its unlimited memory,
and has a tape head that can read and write symbols (of a set Γ) and move
around a tape (to the L(eft) or R(ight)). We here assume the tape is right-
infinite; this means the tape continues infinitely to the right side but it has
a left-most position. Initially the tape contains an input string of symbols
from an input alphabet Σ and is blank (i.e. filled with a special blank symbol
") everywhere else. The Turing Machine is in a state q of a set of states Q,
and starts in an initial state q0 . It uses a transition function δ that deter-
mines how it gets from one configuration (that is the current state, the tape
contents and the head location) to the next. This transition can consist of
writing a new symbol of the tape alphabet Γ to the tape and moving the tape
head either Left or Right, and depends on the current state and the current
symbol on tape. This computation (i.e. sequence of transitions) continues
until the Turing Machine enters either the (final) state qaccept or the (final)
state qreject . We can define a Turing Machine (sometimes called determin-
istic, since each transition is determined uniquely given the configuration)
formally as a septuple:
Definition of a Turing Machine (TM):

A Turing Machine (TM) := (Q, Σ, Γ, δ, q0 , qaccept , qreject ) with:
1 Q is a finite set of states.
2 Σ is a finite input alphabet not containing the special blank symbol ".
3 Γ is a finite tape alphabet, where {"} ∈ Γ and Σ ⊆ Γ.
4 δ is the transition function, where δ is finite and
δ : Q × Γ → Q × Γ × {L, R}.
5 q0 is the start state, where q0 ∈ Q.
6 qaccept is the accept state, where qaccept ∈ Q.
7 qreject is the reject state, where qreject ∈ Q and qreject
= qaccept .
We call configurations accepting configurations if the state is qaccept , re-
jecting configurations if the state is qreject , and halting configurations if the
state is either qaccept or qreject . A start configuration C on input w is a con-
figuration with state q0 and the head is on the leftmost position on the tape
with just w on it.
After defining the Turing Machine, Turing made his famous proposal
(known as Turing’s thesis, see also section 9.3) for the concept of ‘com-
putability by a Turing machine’. The proposal says that whenever there
is an effective method for obtaining the values of a mathematical function
(i.e. it is intuitively or effectively computable), the function can be computed
by a Turing Machine. The converse claim is trivial, and if the thesis is correct
we can reduce problems of (non-)existence of effective methods by problems
of the (non-)existence of Turing Machine problems. We quote one of Turing’s
formulations from [90]:
Turing’s Thesis: LCM’s [Logical Computing Machines, Turing’s expres-

sion for Turing Machines] can do anything that could be described as “rule
of thumb” or “purely mechanical”.
We now introduce more of Turing’s theory of Turing Machines before we

define his proof of undecidability.
We define a language to be a set of strings, a string being a series of

alphabet symbols (i.e. w ∈ Σ∗ , for all strings w). We say that a TM M
accepts input string w if a sequence of configurations C1 , . . . , Ck exists where
1 C1 is the start configuration of M on input w.
2 Each Ci yields Ci+1 via the transition function δ on M .
3 Ck is an accepting configuration.
A set of strings that M accepts is called the language of M .
Definition of the language of a TM: The language of a TM M , notation

L(M ) := {w | w is a string that M accepts }.
Let w ∈ Σ∗ . We now define a notion that covers the ability of a TM to

end in the accept state when started with any string of a certain language.
Definition of Turing-recognizable: A language L is recognized by a TM

M := there exists a TM M such that for all strings
1 with input w, M stops in qaccept if w ∈ L and
2 with input w, M stops in qreject or does not stop (loops) if w ∈

/ L.
If language L is recognized by a TM M we say that M is an acceptor for

L. We distinguish between recognizing and deciding capabilities.
Definition of Turing-decidable (or decidable): A language L is decided

by a TM M := there exists a deterministic TM M such that:
1 with input w, M holds in qaccept if w ∈ L, and
2 with input w, M holds in qreject if w ∈

/ L.
If a language L is decided by a TM M we say that M is a decider for L.
There are several variants on Turing Machines such as double-sided in-

finite Turing Machines, multitape Turing Machines, non-deterministic Tur-
ing Machines and certain types of so-called enumerators. Most variants are
equivalent in the sense that they can recognize the same set of languages
(but not necessarily equally efficient).
Example: We now give an example of a Turing Machine solving a mathema-

tical problem by first defining it as a language problem. The problem (idea
from [56]) is to design a Turing Machine that computes the function
f (x, y) = x + y if x ≥ y
f (x, y) = 0 if x < y
For simplicity, we assume x and y to be positive integers. First we have to

choose a convention for representing positive integers, and decide what the
initial situation of the tape is. We choose a unary notation in which any
positive integer xis represented by w(x) ∈ {1}+ , such that | w(x) | = x. We
assume that w(x) and w(y) are on the tape in unary notation, separated by
a single ‘0’ and with the read-write head on the left-most symbol of w(x).
We first describe how the sum of x and y can be calculated, then how the
comparison x ≥ y can be made and finally how to combine those two ma-
chines into a Turing Machine that computes the desired function.
Calculating the sum
To add the two numbers a and b, we only have to remove the separating
0, so addition amounts to the concatenation of two strings. The following
Turing Machine, called Adder, adds a and b and is constructed relatively
simple:
Adder = (Q, Σ, Γ, δ, q0 , qA , qR ), with
Q = {q0 , q1 , . . . , q4 }
Σ = {0, 1}
Γ = {0, 1, "}
q0 = {q0 }
qA = {q4 }
qR = {}
δ(q0 , 1) = (q0 , 1, R)
δ(q0 , 0) = (q1 , 1, R)
δ(q1 , 1) = (q1 , 1, R)
δ(q1 , ") = (q2 , ", L)
δ(q2 , 1) = (q3 , 0, L)
δ(q3 , 1) = (q3 , 1, L)
δ(q3 , ") = (q4 , ", R)

Note that we remove the ‘0’ by temporarily creating an extra ‘1’, a fact
that is remembered by putting the machine into state q1 . The transition
δ(q2 , 1) = (q0 , 0, R) is needed to remove this ‘1’ at the end of the computa-
tion. Finally, we move the read-write head back to the leftmost ‘1’. This
is not strictly necessary in this example, because the machine is designed
such that it will terminate right after any addition, but it is not harmful and
normally a good habit to let any action terminate in a state from which it is
easy to take further transitions.
Comparison
To compare two numbers a and b, we again assume they are written in the
notation that we used before and divided by a single ‘0’. We will construct
a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting
state if a < b. Thereto we can match each ‘1’ on the left of the dividing
‘0’ with a ‘1’ on the right. We can do this by starting at the leftmost ‘1’
(of the number a) and interchangeably check off the leftmost symbols of the
numbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively.
The matching will stop when one of the two sequences of ‘1’s is completely
checked off. If x < y then the right sequences will still contain ‘1’s, and
if x ≥ y either the left sequence contains ‘1’s or neither sequence contains
‘1’s. In the first case, we still find a ‘1’ on the right when all ‘1’s on the left
have been replaced. We use this to get into the state q5 . In the second case,
if a ≥ b, when we attempt to match another ‘1’, we encounter a blank at
the right of the working space, which can be used as a signal to enter the
accepting state. If we work this out in detail, we get the following Turing
Machine called Comparer :=
(Q, Σ, Γ, δ, q0 , qA , qR ), with:
Q = {q0 , q1 , q2 , q3 , q4 , q5 , q6 , q7 }
Σ = {0, 1}
Γ = {0, 1, x, y, "}
q0 = {q0 }
qA = {q5 }
qR = {q7 }
The transitions of δ can be grouped in several parts.
δ(q0 , 1) = (q1 , x, R)
δ(q1 , 1) = (q1 , 1, R)
δ(q1 , 0) = (q2 , 0, R)
δ(q2 , y) = (q2 , y, R)
δ(q2 , 1) = (q3 , y, L)
This set replaces the leftmost ‘1’ of a with ‘x’, then causes the read-write
head to travel right to the first ‘1’ of b and replace it with the symbol ‘y’.
When the dividing ‘0’ is passed, the machine enters state q2 , indicating that it
is now dealing with the number b. When the symbol ‘y’ has been written, the
machine enters a state q3 , indicating that on ‘1’ of ‘y’ has been successfully
paired with a ‘1’ of ‘x’. The next group of transitions reverses the direction
and repositions the read-write head over the leftmost ‘1’ of a, and returns
control to the initial state,
δ(q3 , y) = (q3 , y, L)
δ(q3 , 0) = (q4 , 0, L)
δ(q4 , 1) = (q4 , 1, L)
δ(q4 , x) = (q0 , x, R)
The rewriting continues this way when the input is a string 1x 01y , stopping
only when on one side no more ‘1’s can be replaced. In that case either the
left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of
‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transition
δ(q4 , x) = (q0 , x, R) will leave the read-write head on a ‘0’ in stead of a ‘1’.
δ(q0 , 0) = (q5 , x, L) (a ≤ b)
δ(q2 , ") = (q6 , ", L) (a > b)
In the first case we still have to check whether the right side has any ‘1’s left,
to determine whether a = b. This is done in the state q5 .
δ(q5 , x) = (q5 , x, R)
δ(q5 , 0) = (q5 , 0, R)
δ(q5 , y) = (q5 , y, R)
δ(q5 , 1) = (q7 , y, R) (a < b)
δ(q5 , ") = (q6 , ", L) (a = b)
Combining Turing Machines for complicated tasks
We now have to put together the Turing Machines’ Adder and Comparer
to obtain the desired Turing Machine that computes the given function. We
can do this by starting with the input a and b in the previously described
notation and starting position, and using Comparer to determine whether
or not a ≥ b. We index all states with a C, i.e. the last transition will be
δ(qC,0 , x) = (qC,5 , x, L) or δ(qC,2 , ") = δ(qC,6 , ", L). In the first case (a ≥ b),
the Comparer should send a ‘start signal’ to the Adder, to give a + b as out-
put. In the second case (a < b), the Comparer should send a ‘start signal’
to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s to
output the value 0 in the desired format.
We show how we can let the Comparer send these ‘start signals’. We first
index all states of the Adder by A and of the Eraser by E. Now in case of
a ≥ b, Comparer ends in state qC,5 , and we can add a transition δ(qC,5 , ∗) =
δ(qA,0 , ∗). The star ‘∗’ stands for any possible symbol, so actually this tran-
sition is a shorthand notation for a set of transitions. Similarly, we can let
δ(qC,7 , ∗) = δ(qE,0 , ∗) bring the Eraser in the initial state. The Adder respec-
tively Eraser will then give the desired output because their behavior on the
input does not change as a result of the remaining of the states by comparer
(to be exact: the state in which the comparer terminates is suitable as an
initial position for Adder or Eraser). The only thing we have not taken care
of is that when the Comparer enters a final state, it does not have the initial
representation of the numbers a and b on tape, but has replaced the ‘1’s by
‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an
exercise if you want) fix this by letting Comparer, as the last action before
entering a final state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing
Machine that combines Comparer, Adder and Eraser to compute the func-
tion f . Similarly to this example, we can for example multiply two numbers
a and b, and we can also translate macro-instructions like ‘if p then qj else
qk ’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes
into a state qj and otherwise into a state qk ), and even combine them into
complicated subprograms that can be invoked repeatedly whenever needed.
(End of Example)
The Entscheidungsproblem
After introducing the notion of a TM in [89], Turing answered Hilbert’s

decision problem for mathematical logic (in German called ‘Entscheidungs-
problem’) in the negative. The Entscheidungsproblem asks whether there
exists a definite method or algorithm which (at least in principle) can be ap-
plied to any given mathematical property to decide whether that proposition
is provable. We now define the notion of an algorithm with the notion of a
Turing Machine, and the set of provable propositions by the set of languages
that can be decided by some TM. If we look at the definition of decidability
in section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, exists
that decides whether ϕ is true or not. If we code ϕ by means of a language,
and this is always possible (see the previous example for a demonstration),
we can reformulate the problem as: for all strings w ∈ L, there exists a TM
M that decides ϕ. We now show that this is not possible for all problems
(i.e. languages) by giving a specific problem, the Halting problem, that is not
decidable.
The Halting problem is the problem of testing whether a TM accepts a

given input string. We define the problem by stating it as a language pro-
blem, and asking whether that language is decidable.
Definition of the Halting problem:

For all strings w, H := {< M, w > | M is a TM and M accepts w}. Is H
decidable? (i.e. is there for each language a TM that decides for all strings
w if they belong to the language or not, that is (using Turing’s thesis, see
section 9.3): is there for each problem an algorithm that can decide it?).
Theorem: H is recognizable
Proof (by Turing): The following TM U , also called Universal Turing Ma-
chine because it is capable of simulating any other Turing Machine, recog-
nizes H. We informally define U , because a detailed definition of the septuple
such a TM consists of (see the definition of a TM) is a lot of work.
Description of Universal Turing Machine: U =

“On the input < M, w > where M is a TM and w is a string:
1 simulate M on input w
2 if M ever enters its accept state, accept”
Note that this TM loops on input < M, w > if M loops on w, which is
why this machine does not decide H. If the algorithm had some way to de-
termine that M was not halting on w, it could reject. Hence H is sometimes
called the Halting problem. As Turing demonstrated, an algorithm has no
way to make this determination.
Theorem: H is undecidable (see also [82, page 165]).

Proof (by Turing): We assume H is decidable and obtain a contradiction.
Suppose D is a decider for H, and defined by
D(< M, w >) :=“
• accept if M accepts w
• reject if M does not accept w”
Now we construct a new TM O with D as a subroutine. This new TM
calls D to determine what M does when the input to M is its own description
< M >. Once O has determined this information, it does the opposite. That
is, it rejects if M accepts and accepts if M does not accept. The following is
a description of O: O = “On input < M >, where M is a TM:
1 run D on input < M, < M >>,
2 output the opposite of what D outputs; that is if D accepts, reject and

if D rejects, accept”
We summarize the behavior of O as follows:

O(< M >) = “
• accept if M does not accept < M >
• reject if M accepts < M > ”
Now we obtain the contradiction by running O with its own description

< O > as input. In that case we get:
O(< O >) = “
• accept if O does not accept < O >
• reject if O does accept < O > ”
Thus neither O nor D can exist.
Turing wrote in his last publication about the interpretation of unsolvable

problems, such as the Halting problem for Turing machines:
These . . . may be regarded as going some way towards a demon-

stration, within mathematics itself, of the inadequacy of ‘reason’
unsupported by common sense.
- Alan Turing
In this section I have made extensive use of [38] [92] for information on
the life and work of Turing and [89] [82] [19] for the theory of TM’s and the
Halting problem. Another valuable source of information on Turing’s life and
work is the website http://www.turing.org.uk/
9.2. CHURCH AND THE LAMBDA CALCULUS 153
9.2 Church and the Lambda Calculus

Alonzo Church (1903-1995) was an American mathematician, whose work is
of major importance in mathematical logic, recursion theory and in theore-
tical computer science. One of the most important contributions to logic is
his invention in the 1930s of the lambda calculus. He is also remembered
for Church’s theorem published in 1936 in [14, page 345-363], stating that
the lambda calculus can be used to embody a correct formalization of the
notion of computability (see section 9.3). The notion of lambda definability
is conceptually the basis for the discipline of functional programming, and
the lambda calculus is also the basis for type theory. Church also founded
the Journal of Symbolic logic in 1956. He had 31 doctoral students including
famous mathematicians such as Turing, Kleene, Kemeny and Smullyan. We
now introduce the lambda calculus (Church’s formalization of the notion of
effective calculability) in a modern setting, using [9, chapter 4].
Application and abstraction
First we introduce the basic concepts of λ-calculus. A formalization fol-

lows thereafter. The lambda calculus has only two basic operations, abstrac-
tion and application.
• Abstraction is for constructing functions: For an expression E we in-

troduce λx.E to denote the abstraction of E over x, i.e. ‘the function
of x which computes E’.
Example1 : λx . x + 1, λn . n × n, etc.
We will later see how to define a recursive function; this is not so easy
since we do not have function names.
• (Function) application: The expression F A denotes that F is consid-

ered as a function (an algorithm) applied to input A. The original
lambda calculus theory is type-free so we also consider F F , that is, F
applied to itself.
Example: (λx . x + 1) 4, (λn . n × n) 7, etc.
1
Note that in some examples we have simplified the notation for the clarity of the
example, since in pure lambda calculus we do not have arithmetic symbols, like + and ×,
but we can encode these operations in the pure lambda calculus, as we will later see.
These two notions can be very powerful if we introduce the rule of beta
reduction which allows us to apply an expression over an abstraction, and for
example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . n×n) 7 can be reduced
to 7×7. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)×n) 7) 4
can be reduced to (λx . (x + 1) × 7) 4 and then to (4 + 1) × 7.
Similar to ordinary mathematics, the names of the variables are irrele-
vant to the rules that can be applied, which allows a transformation of the
names (also known as dummy transformation). This rule in lambda calculus
is called alpha conversion. For example, alpha conversion allows us to rewrite
λn . nn to λx . xx, since they are essentially the same function.
Note that we also want to use functions as variables and arguments:

((λf . (λn . λx . f x × n) 7)(λy . y + 1)) 4 should reduce to the earlier
expression.
But above we only have functions of one argument; we now introduce functions
with more arguments, while avoiding new notations. We can solve this pro-
blem by using iteration of applications, often called currying after the Amer-
ican mathematician H.B. Curry who made it popular.
Example: f (x, y) = 3 × x + y can be written as F1 ≡ λx . (λy . 3 × x + y).

Then f (4, 5) is written (F1 4) 5, that is ((λx . (λy . 3 × x + y)) 4) 5, which
can be reduced to (by using beta reduction): 3 × 4 + 5.
The above explanation and examples give an idea of what lambda calcu-
lus is. We will now work towards a more formal definition of lambda calculus.
The system of lambda calculus is based on the structure of Abstract Reduc-
tion Systems (ARS). The terms of the ARS then coincide with the inductively
defined lambda terms and the reduction relation will be β−reduction. So be-
fore we formally define the lambda calculus, we introduce the most relevant
theory of abstract reduction systems.
Abstract Reduction Systems
Definition of Abstract Reduction System (ARS): An abstract reduc-

tion system A := a structure A, → consisting of a set A and a binary
relation → on A (i.e. →⊆ A × A).
The relation is also called reduction or rewrite relation. If for a, b ∈ A, we
have a → b, we call b a one-step reduct of a.
The transitive and reflexive closure of → is written as (or alternatively

∗
→ ). This means is the smallest relation on A satisfying, for all a, b, c ∈ A,
(closure of →) if a → b then a b,
(reflexive) a a, and
(transitive) if a b and b c then a c.
Thus a b if and only if there exists a finite sequence of reduction steps
a ≡ a0 → a1 → . . . → an ≡ b. This sequence may be empty, in which case
a ≡ b. Here ≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b if
and only if a and b are the same element of A.
Definition of Normal Form: A term a ∈ A of an ARS < A, →> is a

normal form := there is no b ∈ A such that a → b. Furthermore, b ∈ A has
a normal form if and only if b a for some normal form a ∈ A
Definition of Weakly Normalizing: The reduction relation → of an

ARS < A, →> is weakly normalizing (or weakly terminating) := every a ∈ A
has a normal form. In this case we also say that A is weakly normalizing
Definition of Strongly Normalizing: The reduction relation → of an

ARS < A, →> is strongly normalizing (also called terminating, well-founded
or noetherian) := there exists no infinite reduction a0 → a1 → a2 → . . .,
with for all n ∈ N, an ∈ A.
Lemma If an ARS is strongly normalizing, it is weakly normalizing.
Proof: We prove this by proving the contraposition: if A, → is not weakly

normalizing then A, → is not strongly normalizing. Suppose A, → is not
weakly normalizing. Then there is a0 ∈ A without a normal form. Since a0
has no normal form, then certainly a0 is not a normal form itself, so there is
a1 ∈ A such that a0 → a1 . Now a0 has no normal form, so a1 can not be a
normal form. Thus we get an element a2 ∈ A such that a1 → a2 . Repeating
this process yields an infinite reduction a0 → a1 → a2 → . . ..
Definition of Unique Normal Form: The reduction relation → of an

ARS < A, →> has the unique normal form property := for all a, b, c ∈ A
such that a b, a c, and b, c are normal forms, we have b ≡ c
Lemma An ARS < A, →> with the unique normal form property is not
always weakly normalizing.
Proof: For instance, the abstract reduction system with only element a ∈ A
and rewrite rule a → a has no normal forms, so it trivially has the unique
normal form property and is not weakly normalizing.
Definition of Local Confluence: A reduction relation → of an ARS

< A, →> is called locally confluent or weakly confluent (also weakly Church-
Rosser ) := for all a, b, c ∈ A with a → b and a → c there exists a d ∈ A such
that b d and c d
Definition of Confluence: A reduction relation → of an ARS < A, →>

is called confluent (or has the Church-Rosser property, or is Church-Rosser )
:= for all a, b, c ∈ A with a b and a c there exists a d ∈ A such that
b d and c d
Lemma If a reduction relation has the unique normal form property and is
weakly normalizing then it is confluent.
Proof: Suppose we have a b and a c. Since → is weakly normalizing,
there are normal forms b and c such that b b and c c . By transitivity
we also have a b and a c , and thus by the unique normal form property
b ≡ c . Hence b b and c b .
Lemma If → is confluent then → has the unique normal form property.

Proof: Suppose a b, a c, and b, c are normal forms. By confluence,
there exists a d such that b d and c d. Since b and c are normal forms,
we must have b ≡ d and c ≡ d, thus b ≡ c.
Syntax
Now we have seen the basic principle of lambda calculus, we will give a
more formal definition. We formally define the syntax of the lambda calculus
by giving its grammar.
Definition of the Syntax of Lambda Terms:

Lambda Term E := C | v | (E1 E2 ) | (λv . E) , with
• C ranges over a set of constants

(we will use the constant names a, b, c, . . . for elements of C)
• v ranges over a (denumerable) set of variables (using v, w, x, . . .)
• (E1 E2 ) denotes a combination involving the application of one expres-

sion (E1 ) to another (E2 ). The subexpression E1 is referred to as the
operator and E2 is referred to as the operand
• (λv . E) denotes an abstraction. Informally it denotes a function of v

which produces result E. The subexpression E is referred to as the body
of the abstraction and v is called the bound variable of the abstraction
We also call lambda terms simply ‘terms’ or ‘expressions’.

Notational conventions: to achieve a minimal notation, we drop parentheses
whenever possible, and assume:
• Association to the left for iterated application:

F E1 E2 . . . En denotes (. . . ((F E1 ) E2 ) . . . En ),
• Association to the right for iterated abstraction:

λx1 . x2 . . . . .xn .E or shortly λx1 x2 . . . xn . E
denotes λx1 . (λx2 . (. . . (λxn . E) . . .)).
Example: We can write the expression F1 of the previous example as

λx y . 3 × x + y, and λv . E1 E2 means (λv . (E1 E2 )).
Free/Bound Variables and α-conversion
We distinguish between free and bound occurrences of variables in an ex-

pression. An occurrence of v in E is said to be bound if it occurs within a
subexpression of E with the form λv . E1 , and the occurrence is said to be
free otherwise.
Example: n occurs free in λx . (x + 1) × n, whereas x occurs bound in this

expression. Both n and x occur bound in λn . (λx . x + 1) × n. Further x
occurs both bound and free in (λx . x + 1) × x (the second occurrence of ‘x’
in this expression is bound, the third occurrence is free).
Definition of free variables: The free variables of a term E, denoted by

F V (E), is a set of variables defined recursively by:
• F V (C) = ∅,
• F V (v) = {v},
• F V (E1 E2 ) = F V (E1 ) ∪ F V (E2 ),
• F V (λv . E) = F V (E) − {v}.
An expression E is said to be closed if F V (E) = ∅.
Example: The expression λz . (λx . z + x)(λy . y × z) is closed.
α-conversion
We consider two terms as ‘equivalent’ if they only differ in their bound

variables. So λx . x and λy . y are considered being equivalent. But we must
distinguish λx . y + x and λy . y + y, since one has a free occurrence of y and
the other not. Note also that λxy . xy and λxy . yx are not equivalent. The
renaming process is called α-conversion, and allows us to change the name
of a bound variable, as long as we do so consistently. It is formally defined
as the equivalence relation generated by the following reduction:
Definition of α-reduction: λx . E →α λy . E , where E is obtained from

E by replacing all free occurrences of x in E by y, provided y is fresh, that is,
y neither occurs as a free variable nor as a bound variable in the expression
E (i.e. it does not occur in E).
Expressions that can be made textually equivalent by renaming bound

variables are called α-convertible or alpha(betically) equivalent. When two
lambda terms E1 and E2 are α-convertible in this sense we write E1 ≡α E2 ,
and often also E1 ≡ E2 .
Example: Some α-conversions:

λx . x + 1 ≡α λy . y + 1
λx . (λy . y × x)
≡α λy . (λy . y × y) (because the y’s in (λy . y × x) will get
bound)
λx .(λy . x × y)y ≡α λx .(λz . x × z)y
From now on, two λ-terms are considered (syntactically) equal if they are
α-convertible to each other.
Substitution
We now formally define the concept of substitution of a variable in lambda

terms.
Definition of Substitution: The substitution of expression E for each free

occurrence of v in expression E0 , denoted by E0 [E/v], is defined by induction
on the structure of E0 as:
• C[E/v] ≡ C

E if x ≡ v
• x[E/v] ≡
x if x
≡ v
• (E1 E2 )[E/v] ≡ (E1 [E/v])(E2 [E/v])



 λx . E1 if x ≡ v

λx . (E1 [E/v]) if x
≡ v and x ∈ / F V (E)
• (λx . E1 )[E/v] ≡

 λy . ((E 1 [y/x])[E/v]) if x
≡ v and x ∈ F V (E)

and y ∈ / F V (E1 E)
Example: (λx . z+7×x)[x+3/z] ≡ λy . (z+7×y)[x+3/z] ≡ λy . (x+3)+7×y.
The following lemma tells us that substitution behaves well; it can be

proven by induction on the structure of λ-terms.
Lemma For all terms E0 , E1 , E2 and variables x, y such that x

≡ y:
E0 [E1 /x][E2 /y] ≡ E0 [E2 /y][E1 [E2 /y]/x].

Reduction System for the Lambda Calculus
As we have seen with an example at the beginning of this section, the

main rule for the lambda calculus is the beta reduction rule, that we can now
formally define.
Definition of β-reduction: β-reduction is the compatible relation gener-

ated by (λv . E1 )E2 →β E1 [E2 /v], with the rules:
E1 →β E2 E1 →β E2 E1 →β E2
E1 E →β E2 E EE1 →β EE2 λv.E1 →β λv.E2
As before, any term matching the left-hand side of the rule is called a redex
and thus any expression of the form (λv . E1 )E2 is called a β-redex .
β-reduction is a reduction relation →β of the pure lambda calculus. We often
write → resp. instead of →β and β . We use =β (or sometimes simply
=) to denote the equivalence relation generated by →β . Note the difference
between ≡( α) and =(β) .
Example: (λnx . (x + 1) × n) 7 4 →β (λx . (x + 1) × 7) 4 →β (4 + 1) × 7.
Example: This example illustrates the need of α-conversion during β reduc-

tion, even if distinct names are chosen from the start. Define TWICE ≡
λf . λx . f (f x), then
(λy . yy)TWICE
→β TWICE TWICE
≡ (λf . λx . f (f x)) TWICE
→β λx . TWICE (TWICE x)
≡ λx . TWICE ((λf . λx . f (f x))x)
→β λx . TWICE ((λx . f (f x))[x/f ]) (Note the name clash)
≡α λx . TWICE ((λy . f (f y))[x/f ])

≡ λx . TWICE (λy . x(xy))
→β ...
Example:
1. (λx . x + 1) ((λy . y × y) 3) β (two possibilities) (3 × 3) + 1,

so different reduction paths are possible.
2. Ω ≡ (λx . xx)(λx . xx) →β (λx . xx)(λx . xx) →β · · ·, thus infinite

sequences of steps are possible: β-reduction is not always terminating.
This corresponds to ‘self-reproducing programs’.
3. (λx . xxx)(λx . xxx) →β (λx . xxx)(λx . xxx)(λx . xxx) →β · · ·, so

terms can even become arbitrarily large.
4. (λy . c)((λx . xxx)(λx . xxx)) → c, but also

(λy . c)((λx . xxx)(λx . xxx)) → (λy . c)((λx . xxx)(λx . xxx)(λx . xxx))
and the latter term can be reduced to c or again to a longer term, etc.
Although we already saw that λ-calculus is neither weakly nor strongly nor-
malizing, it does have the important confluence property. First we introduce
the following definition of the diamond property that we use to prove that
→β is confluent. To prevent confusion in the notation we will from now on
also use the implication symbol ⇒.
Definition of the Diamond Property: A binary relation → on the

lambda terms Λ satisfies the diamond property, notation → |= ♦ :=
(∀M, M1 , M2 : M, M1 , M2 ∈ Λ : (M → M1 ∧ M → M2 ) ⇒ (∃M3 : M3 ∈ Λ :
M1 → M3 ∧ M2 → M3 ))
Note that a reduction →β has the Church-Rosser property if it satisfies

the diamond property.
Lemma: Let → be a binary relation on a set Λ with its transitive,

reflexive closure and let → |= ♦. Then |= ♦.
Proof: Assume → is a binary relation on a set Λ with its transitive,

reflexive closure, and →|= ♦. We now have to prove that |= ♦. Suppose
M , L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ :
L N ∧ K N ). Let
(*) M ≡ M0 → M1 → . . . → Mn ≡ L, for some n ∈ N
(**) M ≡ K0 → K1 → . . . → Km ≡ K, for some m ∈ N
We now need to apply a technique called induction loading (see for more
information the links on http://zax.mine.nu/stage/) to prove that K and L
have a common reduct N. To be precise, we show that l(m, n) holds for all
m, n ∈ N, with
l(m, n) := there exists a N (i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n

∧ 0 ≤ j ≤ m such that:
(a) N (i, 0) ≡ Mi if 0 ≤ i ≤ n
(b) N (0, j) ≡ Kj if 0 ≤ j ≤ m
(c) N (i, j) → N (i, j + 1) if 0 ≤ i ≤ n ∧ 0 ≤ j < m
(d) N (i, j) → N (i + 1, j) if 0 ≤ i < n ∧ 0 ≤ j ≤ m
Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have
a common reduct. So the only remaining proof obligation is to show that
l(m, n) holds for all m, n ∈ N. We prove this by induction to n.
Base case (n): n=0
(a) let N (0, 0) be M0 , then (a) holds trivially by reflexivity of ‘≡’.
(b) let N (0, j) be Kj for 0 ≤ j ≤ m, then (b) also holds trivially.
Note that this is valid in combination with the definition under (a)
since N (0, 0) ≡ M0 ≡ M ≡ K0 .
(c) N (i, j) → N (i, j + 1) holds because i = 0 and (**).
(d) N (i, j) → N (i + 1, j) holds trivially because n = 0 yields an empty range for i.

Induction case (n): Induction hypothesis (i.h.-n): suppose that for n = k,

k ∈ N, for all m ∈ N the statement l(m, n) is true. We now prove the
statement for n = k + 1. We do this by induction to m.
Base case (m): m=0
(a) let N (k + 1, 0) be Mk+1 for 0 ≤ k ≤ m, then (a) holds trivially.
(b) since j = 0 this amounts to N (0, 0) ≡ K0 .
This is true because of our previous definition of N (0, 0) ≡ M0 .

and the fact that M0 ≡ M ≡ K0 .
(c) holds trivially, because m = 0 yields an empty range for j.
(d) N (i, j) → N (i + 1, j) because j = 0 and (*).
Induction case (m): Induction hypothesis (i.h.-m): suppose that for m = r

and n = k + 1, r ∈ N, the statement l(m, n) is true. We now prove the
statement for m = r + 1.
(a) N (i, 0) ≡ Mi for 0 ≤ i ≤ k + 1 follows from i.h.-n.
(b) N (0, j) ≡ Kj for 0 ≤ j ≤ r + 1 follows from i.h.-m.
(c) and (d)

We already know from the induction hypotheses that N (i, j) →
N (i, j + 1) is okay for (0 ≤ i ≤ k + 1 ∧ 0 ≤ j < r) ∨ (0 ≤ i < k ∧ 0 ≤
j < r + 1). What we now have to show is that this is also true for
i = k+1 and j = r+1. We know by (c) of i.h.-m there exists a N (k, r)
such that N (k, r) → N (k, r + 1). We also know by (d) of i.h.-n that
there exists a N (k, r) such that N (k, r) → N (k + 1, r). Then by the
diamond property of → we know (∃N (k + 1, r + 1) : N (k + 1, r + 1) ∈
Λ : N (k, r + 1) → N (k + 1, r + 1) ∧ N (k + 1, r) → N (k + 1, r + 1)).
We can now sketch the proof2 of the following fundamental theorem of

the untyped lambda calculus:
2
The lines of the proof are due to W. Tait and P. Martin-Löf (see [6], section 3.2]), but
as far as I know this is the first proof that formalized the above lemma to a reasonable
extent.
Theorem (Church, Rosser): →β is confluent.

Proof: By the previous lemma, we know that if any binary relation on a set
satisfies the diamond property, its transitive reflexive closure also satisfies the
diamond property. Suppose we have a binary relation →partial−β on the set Λ
such that β is the transitive reflexive closure of →partial−β . So if we prove
that →partial−β satisfies the diamond property, by application of the previous
lemma we have proved that β satisfies the diamond property, i.e. →β is
confluent.
A concrete definition of →partial−β , a proof that its transitive reflexive closure
is indeed →β , and a proof that →partial−β satisfies the diamond property can
be found on pages 60-62 of [6].
Theorem: λ-calculus has the unique normal form property.

Proof: Suppose that a term a of Λ, → has two normal forms, n1 ∈ Λ
and n2 ∈ Λ. This means there is no b ∈ Λ such that n1 → b or n2 → b.
But a n1 ∧ a n2 , and then by the Church-Rosser property we know
(∃c : c ∈ Λ : a n1 ∧ a n2 ). But then we must have n1 ≡ n2 .
Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . ..
Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form.

Confluence is a fundamental property for functional programming; we relay
on this when we evaluate programs by rewriting, knowing that we never have
to backtrack an evaluation (this is also one of the main differences with logic
programming).
In the λ-calculus we have defined in this section, we can represent natural

numbers and basic operations on the natural numbers. We will not show
this here; in most books on the lambda calculus there are some examples of
how to do basic arithmetic in lambda calculus. The λ-calculus represents a
certain class of (partial) functions on the integers. By a classical result of the
American mathematician Stephen C. Kleene (1909-1994) this is exactly the
set of (partial) recursive functions. The proof can be found in [6, theorem
9.2.16]. Church also thought of the set of functions that could be calculated
in his λ-calculus, and conjectured the following thesis:
Church’s thesis (1936) The set of effectively computable functions, i.e. functions
that intuitively (effectively) can be computed, is the same as the set of
functions that can be defined in λ-calculus.
A more formal version and detailed treatment of Church’s thesis can be

found in section 9.3.
Alan Turing proved in 1937 that the class of Turing computable functions is
the same as the class of functions definable in λ-calculus.
So the power of Turing Machines is the same as the power of λ-calculus.

Both models capture the intuitive idea of computation. This important thesis
is the subject of the next section.
9.3 The Church-Turing thesis

The Church-Turing thesis concerns the intuitive notion of algorithm (or ef-
fective or mechanical method) in logic and mathematics. The notion of an
algorithm or an effective method is an informal one, and attempts to char-
acterize this effectiveness lacked rigor, mainly because the key requirement
that the method demands no insight or ingenuity is left unexplicated.
One of Turing’s achievements in his paper of 1936 (reprinted in [19]

and online available at http://www.abelard.org/turpap2/tp2-ie.asp) was to
present a formally exact predicate with which the informal predicate ‘can be
calculated by means of an algorithm or effective method’ may be replaced.
The formal concept proposed by Turing is that of computability by a Turing
Machine (see section 9.1). He introduced this thesis in [90] in the course of
arguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolv-
able.
Turing’s thesis: TM’s can do anything that could be described as intu-

itively computable
Church also presented in [14] a formally exact way to express this no-
tion of intuitively computable. Turing’s method was however more obvious
and more general than Church’s, since the latter only considered functions
of positive integers. In order to calculate the values of the function Church
introduced his lambda calculus and specified the notion of a recursive func-
tion (see section 9.2).
Church’s thesis: A function of positive integers is effectively computable

only if it is recursive
The reverse implication is also referred to as the converse of Church’s

thesis. The class of lambda-definable functions and the class of recursive
functions were later shown to be identical. This was established in the case
of functions of positive integers by Church and the American mathematician
Kleene (see [47], [14]). After learning of Church’s proposal, Turing quickly
established that the apparatus of lambda-definability and his own apparatus
of computability were equivalent ([89], page 263).
9.3. THE CHURCH-TURING THESIS 167
Theorem: Lambda-definability and Turing Machine-computability

are equivalent.
Proof: See [89, page 263] for a proof that Turing’s machines and Church’s
lambda calculus can compute the same set of functions.
Although Turing and Church had chosen different ways to formalize the
intuitive notion of effective computability, respectively by identifying the no-
tion with that of computability by a Turing Machine and in the lambda cal-
culus, both methods are equivalent. After this proof of equivalence, Kleene
introduced the term ‘Church-Turing thesis’ to refer to any of the two equiv-
alent theses ([48], page 232).
Church-Turing thesis: The intuitive notion of an algorithm equals the

Turing Machine algorithm or (equivalent) the calculable functions of lambda-
calculus
There are a number of misunderstandings of the Church-Turing thesis,

collected in [16]; Turing did not show that
• Any problem can be solved ‘by instructions, explicitly stated rules or
procedures’
• A universal TM ‘can compute any function that any computer, with
any architecture can compute’ (Turing said noting about the limits of
what can be computed by a machine)
• Whatever can be calculated by a machine (working on finite data in
accordance with a finite program of instructions) is Turing-machine-
computable (this is known as Thesis M, see [16])
• Any process that can be given a systematic mathematical description
(or a ‘precise enough characterization of a set of steps’, or that is
‘scientifically describable’ or ‘scientifically explicable’) can be simulated
by a TM (this is known as Thesis S, see [16])
Since the word ‘computable’ is often tied by definition to effective calcu-
lability, the Church-Turing thesis is often stated as ‘All computable functions
are computable by a Turing Machine’ (a function is said to be computable if
and only if there is an effective procedure for determining its values).
If we summarize the above, we can say that to define the concept of an

algorithm, Church used a notational system, the lambda calculus. Turing did
the same with his theoretical computing device, the Turing Machine. On the
face they seemed very different from one another, but these two definitions
turned out to be equivalent, in the sense that each picks out the same set
of mathematical functions. The Church-Turing thesis is the assertion that
this set contains every function whose values can be obtained by a method
or algorithm corresponding to our intuitive notion of effectively computable.
Clearly, if there were functions of which an informal (intuitive) statement,
but not the formal statement, were true, then the latter would be less gene-
ral than the former and so could not be reasonably be employed to replace
it. When the thesis is expressed in terms of the formal concept by Turing,
it is appropriate to refer to the thesis also as the Turing thesis, and idem
for the case of Church. It is agreed amongst mathematicians and logicians
that ‘computable by means of a TM’ is the correct accurate rendering of the
informal notion in question.
Chapter 10
Conclusion
It is a profoundly erroneous truism, repeated by all copy books

and by eminent people, when they are making speeches, that we
should cultivate the habit of thinking of what we are doing. The
precise opposite is the case. Civilization advances by extending
the number of important operations which we can perform with-
out thinking about them. . . . The study of mathematics is apt to
commence in disappointment . . . We are told that by its aid the
stars are weighed and the billions of molecules in a drop of water
are counted. Yet, like the ghost of Hamlet’s father, this greatest
science eludes the efforts of our mental weapons to grasp it.
- A. Whitehead, in [99]
When I started my study on the foundations of mathematics, I did not

quite know what to expect. By now I’ve learned that the foundations of
mathematics can be a fascinating and important subject. Learning this new
subject was an interesting challenge, but sometimes hard work when I had
to go through numerous books that were full of details or too vague and
philosophical. Most books that I found on the foundations of mathematics
were either very detailed and descriptive (with an unmatched level of detail
and exactness is the book [31] of I. Grattan-Guinness) or treat only a part
of the theory that was developed from 1890 to 1940 (for example [17] gives
an excellent introduction to set theory). One of the better, though relatively
unknown, is the book of G.T. Kneebone [49] that is quite complete and still
considerably theoretic. One of the motivations to write this article was to
169
170 CHAPTER 10. CONCLUSION
present the theory properly. Hopefully that makes it more clear and enjoy-
able. Some of the good literature used, such as the books just mentioned,
will be found in the references at the end of this report.
At the same time, I also tried to briefly introduce the reader to the his-
torical context of the most important developments. Most undergraduate
courses I have taken gave little or no information about the history that is
laying behind the theory. Emphasis was laid on the accumulation of mathe-
matical knowledge. I believe that the history of mathematics in education
can not only make the study of mathematics more interesting, but also help
in the growth of mathematical understanding and appreciation of the current
form of the theory.
I want to conclude this report with a summary of the theory and my own
view on the project, and with some ideas for future work.
The project
In the beginning of the 20th century Hilbert said we should formalize all
of mathematics, mathematical reasoning. This ‘project’ (from now on I will
refer to it as the project) has been the central theme of this report. When
reading about the work and biographies of all those brilliant men that have
put themselves on this problem, you can (at least that’s what happened to
me) get caught up into this fascinating philosophical question.
To most people however, this all seems very impractical. We all know
you can make a popular operating system or start your own business on the
web and in one year make a million dollars if you’re lucky. And when it
comes to verifying mathematical proofs and making reliable software, a for-
mal basis is rarely used, the human mind is still the most important, and
other techniques, such as model-checking, are preferred. It might be worth
writing another article, on how and why in that respect the more practical,
working mathematicians and more theoretical logicians (or formalists, if you
prefer) grew apart. But let’s first go back to the project.
The attempt to formalize mathematical reasoning is not new - the Greek

already thought rationality was the supreme goal. We can think of Plato
171
and Reason, or as Russell1 would say - think of Pythagoras and Rationality!

Aristoteles made a big step in formalizing the reasoning, with his patterns
of reasoning that are known as syllogisms. Ever since, logic was further
developed and important contributions come from De Morgan, Leibniz and
especially Boole. Because he was interested in theology and God (see [31,
chapter 3] and also [30, section 5.8, page 203]), Cantor became obsessed with
the notion of infinite, and developed his theory of infinite sets. With Cantor
mathematics got more abstract, and some people regarded his set theory
as a disease. Poincaré, the great French mathematician, said2 : (from [95])
“Later generations will regard Mengenlehre (set theory) as a disease from
which one has recovered.”. Peano and Frege, as we have learned in chapter
4, brought mathematical reasoning to an even higher level of formalization.
So far, so good. But there turned out to be some problems, and although
Cantor had already noticed this (see Cantor’s paradox in section 3.8), it was
Russell who spread the bad news to everyone, by stating his Russell paradox.
At that point Hilbert proposed to use a formal axiomatic method to solve
these problems, and he gave his famous three requirements of consistency,
completeness and decidability.
This proposal of Hilbert to formalize mathematics, led to the development

of several axiomatic systems, such as those of Zermelo and Fraenkel, and of
Gödel, Bernays and Neumann. Russell and Whitehead made their own at-
tempt to formalize mathematics, with their theory of types. But although
all of these attempts were fruitful to a certain extent, in total they all failed,
and it took Gödel and Turing to show that in fact ‘the project’ couldn’t
be done. Formalizing mathematics so that we have absolute truth is not
possible! But these works of Gödel and Turing were new and complicated,
and not everyone clearly recognized its importance. And even nowadays, few
people are familiar with the details of their work, and we often see confu-
sion between notions like ‘checking the proof of a statement’ and ‘checking
whether a statement is true (or not)’. There is also much confusion about the
exact implications of Gödel’s and Turing’s work. Gödel created a statement
within arithmetics, that is not provable in any axiomatic system. Turing
later formalized the notion of computability to show there is no mechanical
1
Although rationality is more commonly associated with Plato, Russell always insisted
on attributing it to Pythagoras (see [62]).
2
Whether or not he actually said this is a matter of debate amongst historians of
mathematics.
procedure to decide if a statement is correct or not.
At first this was a shock, but then mathematicians were saying (and
again it would be nice to write an article about the different responses of
mathematicians and logicians): so what - we should do mathematics exactly
the same way as we’ve always done it, this does not apply to the problems
I care about. Indeed mathematicians continued with their work, and the
theorems of Gödel and Turing had no or little impact in practice on how
we (should) do mathematics. The only effect the project might have had on
working mathematicians, is that they have become a bit more precise in the
use of language and in writing their proofs. Some of course were inspired
by problems like the 23 of Hilbert. But there has been another consequence
of all this theoretical work, that I was made aware of through a videotaped
lecture of G.J. Chaitin on the internet. I quote him about Hilbert’s attempt
to formalize all mathematics after the publications of the theorems of Gödel
and Turing: “It failed in that precise technical sense. But in fact it succeeded
magnificently, not formalization of reasoning, but formalization of algorithms
has been the great technological success of our time - computer programming
languages! So if you look at the history of the beginning of this century you’ll
see papers by logicians studying the foundations of mathematics in which
they had predicate calculi. Now you look back and you say this is clearly
a programming language! [...] If you look at Turing’s paper of course there
is a machine language [...]. Or, as von Neumann said: the universal Turing
Machine is really the notion of a general purpose programmable computer -
and that’s the idea of software. [...] If you look at papers by Alonzo Church
you see the lambda calculus, which is a functional programming language.
If you look at Gödel’s original paper you see what looks like LISP, it’s very
close to LISP”. As he showed there are numerous examples of unexpected
offspring of theoretical research, and all of the foundational work is not so
impractical after all! As G.J. Chaitin concluded in his speech, this is the
way “we’re all benefiting from the glorious failure of this project!”. Now
this is not entirely true, but it is true that theoretical studies, as he says
“don’t have spin-off in dollars right away, but sometimes they have vastly
unexpected consequences”. Formal methods/studies have not always done a
good job promoting themselves - maybe we can emphasize this aspect and
show that technology often advances through fascinating impractical ideas.
173
Status of the project
That brings us to ask if the question of the foundation of mathematics,

more than a decade after Hilbert formulated it, is now settled once and for
all. The short answer is: it is not. Even from the amount of interesting
resources on current research that are available on the internet alone, we can
conclude there is still a lot of work to do on the foundations of mathematics.
I consider creating an online version of this document with more background
information and links.
Although Gödel and Turing showed that it is impossible to totally for-

malize even basic arithmetic, let alone the whole of mathematics, it is still
possible to formalize parts of mathematics (for example, geometry) success-
fully. As P. Andrews says in [4], “attempts to understand the nature of rea-
soning and to build sophisticated information systems which can draw logical
conclusions may be regarded as part of an endeavor to fashion more powerful
intellectual tools for coping with the increasingly complex problems which
confront mankind.” In that respect the formalization is not restricted to ma-
thematical reasoning, and it can also be applied to other disciplines (such
as physics, chemistry or even social sciences). Especially the development
of software and computer systems will be facilitated by a formalization of
theories. Despite that total formalization of parts of mathematics is very
useful, this is not the focus of most current research: (most people believe
that) the human mind will (at least for the near future) be the one to prove
whether a given mathematical statement is true or not.
Ideas for future work and

distinguishment between mathematics and software
And although it cannot be determined by a machine whether any given

mathematical statement is true, we can try to develop an axiomatic system
such that as much as possible of the interesting statements3 can be proved
within that system. This is useful because, even when all axiomatic systems
are incomplete and there are undecidable statements, if we provide one of the
3
As interesting statements, we consider all statements in the (everyday) work of prac-
ticing mathematicians. These ‘practical’ statements do not include the specific purely
theoretical statements that Gödel invented for his incompleteness theorem.
statements that the system does contain, and which we claim to be decida-
ble by providing a concrete and completely formalized (dis)proof of it within
that system, we still have a way to decide mechanically whether or not the
proof is correct for the given statement. The question then is if the set of
statements for which we can do this, still forms a part of mathematics that
is interesting enough. This has to be a part of our investigation: to find out
how many of the practical mathematical proofs contain ‘meta-arguments’, in
other words which classes will fall outside our system. Although we want to
change as little as possible to the (side of) mathematics itself, this also might
be a necessary option4 . As P. Andrews calls his book [4], we get: ‘to truth
through proof’. This should be the first goal for the near future:
(1) Investigate which parts of mathematics can(not) be formalized (i.e. con-

tain ‘meta-arguments’), which formalization is best usable and allows most
parts of (practical) mathematics to be formalized, and totally formalize proof
checking for as most parts of mathematics as possible.
Formalization is not only important to check the correctness of mathema-

tical theories that are becoming ever more complex. Many models in physics
and chemistry depend on underlying mathematical theorems, and the suc-
cess of the model depends on the correctness of the mathematical theorems.
Also, we are becoming more and more dependent on automated systems, in
particular computers and software. There is a growing need for reliable (that
is, correctly specified and working according to the specifications) software,
not only for (safety) critical systems, but also in everyday applications. A
formal approach can not only be used to prove correctness of mathematical
statements but also of computer programs. This is an important point:
Distinguishment between mathematics and software construction.
Instead of the proofs of mathematical statements, we are then checking

the derivation steps of program derivations. I want to emphasize this differ-
ence, since it is often unclear or left implicit which of the two is meant when
arguments for/against formalistic studies are given. We have to realize that
we can never obtain a 100% guarantee of correctness of any algorithm, since
4
For a successful formalization of parts of mathematics we therefore do not only look
at the axiomatic system, but it also might require us to limit certain parts of mathematics
so that they contain less undecidable proofs or require us to rewrite certain existing proofs
to a form that is permitted by the system.
175
we also are dependent on the correctness of the proof-checker. That is why

we have to try to keep the proof-checker as simple, small and intuitive as pos-
sible (see also the ‘Bruijn criterion’ in [26, pages 4 and 26]). And analogue,
we can never obtain a 100% guarantee of correctness of any mathematical
statement, since we learned from Gödel that the consistency of any axiomatic
system cannot be proved within that system, and therefore we better also
try to keep the axiomatic system as simple, small and intuitive as possible
(we could see all this as the Bruijn criterion variant for axiomatic systems).
But nevertheless, any such implementation of a proof checker would give us
the highest degree of certainty possible.
Software and Proof Checking
I would also like to remark that proof checking for programs can only give
us a way to verify the correctness of programs. At least as important (to ob-
tain correct programs) is the correct construction of programs. This is the
focus of the work in the area of programming methodology. At the Eindhoven
University of Technology for example, the techniques of E.W. Dijkstra are
used to derive correct programs from their specification. Unfortunately both
areas (proof checking/verification vs. construction/derivation) are merely ad-
vocates of their own approach, while a combination of both could give the
best results. Although there has been some minor work on formalizing these
proof techniques and combining formal methods and program derivations
(see for example [26]), cooperation is still minimal. If we go one step further
back in the process of creating correct software, the success of any piece of
software depends on the correctness of its specification. These first phases of
software engineering (indicating user requirements/specifications) can also be
adopted to comply with the methods of program derivation and formal proof
checkers (note that we not only use the term ‘proof checker’ for mathematics,
i.e. to check mathematical statements, but also for the software variant: for
checking algorithms/programs derivations). And since we can never obtain
a 100% guarantee of correctness of software (it depends for example on the
correctness of the specifications and the proof checker itself), model checking
techniques can also be used as a verification method to improve reliability
even further. Therefore I stress for an integrated approach, for the combina-
tion of all of the mentioned methods can only together give us the highest
reliability (i.e. highest chance of correctness of software). Such an integrated
approach requires research and cooperation between the various branches
representing the methods I mentioned before and ultimately incorporation

in the software engineering process.
Mathematics and Proof Checking
Let’s go back to proof checking of mathematical statements. We men-

tioned the first goal of investigating and formalizing proof checking. As a
next step (2) we can think of building proof assistants. Proof assistants not
only check the proofs for us, but also help us in making the proofs: they
are tools that are a combination of a proof development system and a proof
checker. A good article about proof assistants using dependent type systems
can be found in [8]. Also an interesting article on computer assisted mathe-
matics (for computer algebra) is [7] with an abstract history of computations
versus proofs in mathematics. The notion of ‘helping’ or ‘assisting’ in making
proofs might be considered vague. For complicated statements, we can think
of tools that keep track of the context of the proof, of the remaining proof
obligations and even fill in part of the proofs for us automatically.
Proof assistants should make it easier for us to prove mathematical theorems.

Then (3) we can think of building a standard library of proved mathematics.
After a proof checker has confirmed the correctness of a given mathematical
statement and its corresponding proof, they can be stored in a database. It
can be accessible to everyone via the internet and even be used for previously
mentioned automated proving methods by proof assistants. And although we
can not see the quality of mathematical work as evident as the quality of phys-
ical products, this could be the long awaited ‘quality stamp’ for mathematics.
There have already been attempts to build standard libraries of mathematics
(see the Mizar project at http://www.mizar.org/ and the PRL project, see
http://.www.cs.cornell.edu/Info/Projects/NuPRL/nuprl.html, but they lack
the formal basis that has to be provided by (1) and (2)). Barendregt and
his group have formalized parts of algebra using the theorem prover COQ.
This shows that it is possible to formalize large parts of mathematics, but
the process itself of formalizing mathematics is too direct and informal and
needs to be further developed. Many valuable experiences have come out of
attempts on what are here called phase (2), (3) and (4), but for a successful
result this is premature and do we first have to start thoroughly at the be-
ginning (1). Work in this direction was done in [44], where a syntax-driven
derivation system is presented for a formal language of mathematics called
177
Weak Type Theory. This is a start of a more rigorous approach to the trans-
lation of mathematical texts (statements and proofs).
We see the extension of proof assistants with more intelligent and sophis-
ticated automated proving methods, as the last and final phase (4) of future
work. Part of the branch of automated proving are classical theorem proving
methods (such as for example automated induction, etc.). New methods are
from areas such as neural networks, fuzzy logic and genetic and DNA com-
puting and in the future possibly even quantum computing.
I want to end these ideas by summarizing the steps that are laying ahead
of us, in a new project.
The new project (for mathematics):

1 Investigate which parts of mathematics can(not) be formalized (i.e. con-
tain ‘meta-arguments’), which formalization is best usable and allows
most parts of (practical) mathematics to be formalized, and totally
formalize proof checking for as most parts of mathematics as possible
2 building a proof assistant (probably based on some form of WTT and

some form of TT)
3 build a standard library (archive) of proved mathematics
4 further develop automated proving techniques (to build in the proof

assistant)
And similarly we can formulate the new project for computer systems:
The new project (for software construction):

1 formalize as much of program derivation checking as possible
2 build a programming assistant (environment) based on a suited (and

preferably popular) programming language
3 build a standard library of reusable correct software (i.e. suitable for

component based software engineering) and its specification
4 further develop automated proving and program derivation techniques

One of the most important questions, part of step (1), has so far in this
conclusion been avoided: What to take for the basis of mathematics? This is
one of the most difficult questions and as we have seen many great scientists
have thought about this. There is currently no consensus of what is the best
approach, and I am not in the position to give an argumented opinion. A
thorough research of the alternatives will have to yield the best approach and
will show which choice of foundational system is best usable in practice.
The only thing I can say is that it seems that recently most people seem to
favor type theory over category theory, relational calculi and also over set
theory. P.J. Scott for example favors type theory over category theory in
the introduction of [55]. H. Barendregt gives arguments for the use of type
theory over set theory in [7], and we quote from [4, the second page of the
preface]: “[People prefer the approach they are most familiar with.] However,
those familiar with both type theory and axiomatic set theory recognize that
in some ways the former provides a more natural vehicle than the latter for
formalizing what mathematicians actually do”. On the contrary, on http://-
www.rbjones.com/rbjpuc/logic/jrh0111.htm we find a detailed assessment
on the choice for a foundational system, with advantages of set theory over
type theory. Also, several new types of logic have been proposed, such as IF
logic (see [37]) and several types of so-called ‘fuzzy logics’, but until so far
it seems they lack preciness, formalization and proofs to support claims that
they can be used successfully as a foundation for mathematics.
A final remark on the debate between type theory and axiomatic set theory
as a foundational basis, is that if there is a mapping from the axioms of
(some form of) set theory in (some form of) type theory and vice versa, type
theoretic expressions have their counterparts in set theory. It is interesting to
investigate if among such mappings there is indeed a bijection. That would
show the equivalence of both theories in expressive power, so that the debate
can turn onto the question which theory is more intuitive and useful.
Some do not really believe in a successful formalization of mathematics but
rather see the indeterminacies in mathematical representations and the un-
decidabilities in any formal system as the source of problem solving and
creative power (see [87, page 174]). This standpoint was already mentioned
in 1807 by the German mathematician Hegel (1770-1831) in [35]: “Dagegen
muß behauptet werden, daß die Wahrheit nicht ein ausgeprägte Münze ist,
die fertig gegeben und so angestrichen werden kann”.
179
I am aware of the limitations of this report. Many chapters are still infor-
mal, such as the work of Frege in chapter 4. The theory of types in chapter
7 and of Gödels incomepleteness theorem in chapter 8 are not completely
covered and certain subjects closer to logic (such as intuitionism) are treated
very minimally. The only excuse I have is that it is simply not possible to
study all the original works in such a short period of time, and include all
theory in this report. I hope to complete this work at a later stage. It might
also be worth to extend (on both sides) the period of which the theory is
treated in this report. Recently we have seen interesting new theories on
category and type theory and even on the foundations of mathematics, as
we look at Chaitin’s results on randomness; it seems that he went further
where Gödel and Turing left off. Finally I would like to remark that the ‘new
project’, consisting of the four steps mentioned in this conclusion, is just my
own view of work that lays ahead of us. To end with a concluding remark
by Alan Turing, from his paper on the Turing test: “We can only see a short
distance ahead, but we can see plenty there that needs to be done”.
Mark Scheffer, August 20015
5
p.s. To those who wonder what the turtle and the elephant are doing on the cover of
this report, I refer to the website http://zax.mine.nu/stage/.
Appendix A
Timeline and Images
Figure A.1: Luitzen Brouwer
Figure A.2: George Cantor
Drawings by Soshichi Uchii, suchii@bun.kyoto-u.ac.jp;

Photo Quine by Kelly Wise;
Photo Ramsey due to Harcourt, Brace, Jovanovich.
181
182 APPENDIX A. TIMELINE AND IMAGES
Figure A.3: Richard Dedekind
Figure A.4: Gottlob Frege
Figure A.5: Kurt Gödel
Figure A.6: David Hilbert

183
Figure A.7: John von Neumann
Figure A.8: Giuseppe Peano
Figure A.9: Henri Poincaré
Figure A.10: Willard Van Orman Quine

Figure A.11: Frank Plumpton Ramsey
Figure A.12: Bertrand Russell
Figure A.13: Alan Turing

185
Bibliography
[1] Y. Bar-Hillel A.A. Fraenkel and A. Levy. Foundations of set theory.

North-Holland Press, Amsterdam, 2 edition, 1973. First edition 1958.
[2] W. Ackermann and D. Hilbert. Grundzüge der Theoretischen Logik,

volume Band XXVII of Die Grundlehren der Mathematischen Wis-
senschaften in Einzeldarstellungen. Springer-Verlag, first edition, 1928.
Berlin.
[3] J.H.J. Almering. Analyse. Delftse Uitgevers Maatschappij, 1993.
[4] P. Andrews. An introduction to mathematical logic and type theory: to

truth through proof. Academic press, 1986.
[5] J. Backer and P. Rudnicki. Hilbert’s basis theorem. Association of

Mizar Users, University of Bialystok, 12, 2000, 2000. Published in
Journal of Formalised Mathematics.
[6] H. Barendregt. The Lambda Calculus - Its Syntax and Semantics, vol-
ume 103. Elsevier Science Publishing Company, Inc., 1984.
[7] H. Barendregt and A.M. Cohen. Electronic Communication of Ma-

thematics and the Interaction of Computer Algebra Systems and Proof
Assistants. J. Symbolic Computation. Academic Press, 2001.
[8] H. Barendregt and H. Geuvers. Proof-checking using Dependent Type

Systems, volume 2, chapter 18, pages 1149-1240 of Handbook of Artifi-
cial Reasoning. Oxford Press, 2001.
[9] C.J. Bloo. Computational Models. TU/e Press, 2001. Manuscript

originally started by H. Geuvers and J. Hooman.
187
188 BIBLIOGRAPHY
[10] J. Breuer. Introduction to the Theory of Sets. Prentice-Hall, August

1958.
[11] Encyclopedia Brittanica. P. Bernays. EB, 2000.
[12] K.S. Brown. Mathematics. Seanet, 1991.
[13] G. Cantor. Ein beitrag zur mannigfaltigkeitslehre. Journal f. reine und

angew. Math., Gesammelte Abhandlungen., 84, pages 119-133, 1878.
Translated in ‘Contributions to the foundation of the theory of transfi-
nite numbers (translation from German’, by Philip E. Jourdain, Dover
Publishing, 1952.
[14] A. Church. An unsolvable problem in elementary number theory, vol-

ume 58. American journal of Mathematics, 1936.
[15] P.J. Cohen. Set Theory and the Continuum Hypothesis. Benjamin,
1966.
[16] B.J. Copeland. The Church-Turing Thesis. Springer-Verlag, 1997. Item

in Stanford Encyclopedia of Philosophy.
[17] H.C. Doets D. van Dalen and H. de Swart. Sets: Naive, Axiomatic and
Applied. Pergamon Press, 1978.
[18] J.W. Dauben. Georg Cantor, His Mathematics and Philosophy of the
Infinite. Harvard University Press, 1979.
[19] M. Davis. The Undecidable: Basic Papers on Undecidable Propositions,

Unsolvable Problems and Computable Functions. Raven Press, New
York, 1965.
[20] Diverse. Mathematische Annalen, 65. Springer-Verlag, Berlin, 1908.
[21] A. Einstein. Relativity: the special and general theory. Methuen Press,
London, 1970.
[22] H. Eves. Mathematical Circles Revisited. Boston Press, 1971.
[23] H. Eves. Foundations and fundamental concepts of mathematics. Dover

publications inc., Mineola, New York, third edition edition, 1997.
BIBLIOGRAPHY 189
[24] A. Fraenkel. Einleitung in die Mengenlehre. Springer-Verlag, third

edition, 1928.
[25] A.A. Fraenkel. Abstract Set Theory. North-Holland Press, Amsterdam,

3 edition, 1966. First edition in 1953.
[26] M. Franssen. Cocktail. Eindhoven University Press, 2000. Doctoral

thesis.
[27] K. Gödel. On formally undecidable propositions of Principia Mathema-

tica and related systems. Dover publications, New York, 1992. English
translation of Gödel’s original 1931 publication of the incompleteness
theorem. First published in 1962 by Basic Books, inc., New York.
[28] D. Goldrei. Classic set theory, a guided independant study. Chapman

and Hall, 1996.
[29] I. Grattan-Guinness. How did Russell write the principles of mathema-

tics (1903). McMaster University Library Press, 1997. In the Journal
of the Bertrand Russell Archive.
[30] I. Grattan-Guinness. From the Calculus to Set theory 1630-1910.

Princeton University Press, 2000. First published in 1980 by G. Duck-
worth & Co, London.
[31] I. Grattan-Guinness. The Search for Mathematical Roots 1870-1940.

Princeton University Press, 2000.
[32] I. Grattan-Guinness. A sideways look at Hilbert’s Twenty-three Pro-

blems of 1900. Middlesex University Press, 2000.
[33] J. Haim. Introduction of the Israel Mathematical Conference Procee-

dings, volume 6. Bar-llan University Press, 1993.
[34] P.R. Halmos. Naive Set Theory. Van Nostrand Press, London, 1990.
[35] G.W.F. Hegel. Phänomenologie des Geistes. Reprint: Meiner, Hbg.,

1807. English translation ‘The Phemenology of Mind’ by J.B. Baillie
in 1910, London.
[36] H. Hermes and H. Schulz. Mathematische Logik. Unknown, 1952. In

Encyklopedia Mathematische Wissenschaften, I1, 1, I, page 58.
190 BIBLIOGRAPHY
[37] J. Hintikka. The Principles of Mathematics Revisited. Cambridge Uni-

versity Press, 1996.
[38] A. Hodges. Turing. The Great Philosophers. Phoenix, 1997.
[39] A.D. Irvine. Bertrand Arthur William Russell. Stanford University

Press, 2000.
[40] D. Joyce. Hilbert’s 1900 Address. Clark University, Worcester, 1997.
[41] D. Joyce. A list of Hilbert’s problems. Clark University, Worcester,

1997.
[42] D. Joyce. The Mathematical Problems of David Hilbert, http://-

alepho.clarku.edu/ djoyce/hilbert/. Clark University, Worcester, 1997.
[43] F. Kamareddine and T. Laan. A reflection on russell’s ramified types

and kripke’s hierarchy of truths. Journal of the Interest Group in Pure
and Applied Logic, 4 (2):195–213, 1996.
[44] F. Kamareddine and R. Nederpelt. A derivation system for a formal

language of mathematics. To be published, July 2001.
[45] I. Kaplansky. Encyclopedia Brittanica, item on David Hilbert. EB,

1990.
[46] E. Kasner and J. Newman. Mathematicians and the imagination. New

York Publishing, 1940.
[47] S.C. Kleene. Lambda-definability and recursiveness. Duke Mathemati-

cal Journal 2:340-353, 1936.
[48] S.C. Kleene. Mathematical Logic. New York, 1967.
[49] G.T. Kneebone. Mathematical logic and the foundations of mathema-

tics. D. van Nostrand Company, 1963. Reprint 2001.
[50] J. Koendrink. Solid Shape. Cambridge, 1990.
[51] K. Kunen. Set theory: an introduction of independence proofs. New

York Press, 1980.
BIBLIOGRAPHY 191
[52] T. Laan. A formalization of the ramified type theory. TUE Computing

Science Reports, 1994. Technical Report 94-33.
[53] T. Laan. The Evolution of Type Theory in Logic and Mathematics.

PhD thesis, Eindhoven University of Technology, 1997.
[54] T. Laan and R.P. Nederpelt. A modern elaboration of the ramified

theory of types. Studia Logica, 57(2/3):243–278, 1996.
[55] J. Lambek and P.J. Scott. Introduction to higher order logic. Cambridge
University Press, 2001.
[56] P. Linz. An introduction to formal languages and automata. D.C. Heath

and Company, 1990.
[57] J.R. Lucas. The conceptual roots of mathematics. Rootledge Press,

2000.
[58] D. MacHale. Comic Sections. Dublin, 1993.
[59] Mosché Machover. Set theory, logic and their limitations. Cambridge
University Press, 1996.
[60] P. Mancosu. From Brouwer to Hilbert, the debate on the foundations

of mathematics in the 1920s. Oxford University Press, 1998.
[61] E. Maor. To infinity and beyond. Boston Press, 1987.
[62] R. Monk. Russell. The Great Philosophers. Routledge, 1999. First

published in 1997 by Phoenix.
[63] G.H. Moore. Zermelo’s axiom of choice: it’s origins, development and
influence. Springer-Verlag, 1982.
[64] E. Nagel and J. R. Newman. Gödel’s proof. New York University Press,
1986. First published in 1958.
[65] G. Peano. Calcolo differenziale e principii di calcolo integrale. Turin

Press, 1884.
[66] G. Peano. Applicazioni geometriche del calcolo infinitesimale. Turin

Press, 1887.
192 BIBLIOGRAPHY
[67] G. Peano. Calcolo geometrico secundo lAusdehnungslehre di H. Grass-

mann e precedutto dalle operazioni della logica deduttiva. Fratelli Bocca,
Torino, 1888. Translation in German ‘Geometric Calculus : Accor-
ding to the Ausdehnungslehre of H. Grassmann’ by Lloyd Kannenberg,
november 1999, Publisher Birkhauser.
[68] G. Peano. Dizionario di matematica. Parte prima. Logica matematica.

Unknown, 1901. In Ri(e)vista di mathematica, edited by Peano.
[69] L.J.J. Wittgenstein P.M. Sullivan. The foundations of mathematics.

Unknown, June 1927. Reprinted by F. P. Ramsey, June 1927, Theoria
61 (2) (1995), pages 105-142.
[70] W. Van Orman Quine. Mathematical Logic. Harvard University Press,

1951. Revised edition of Norton, New York 1940.
[71] W. Van Orman Quine. From a Logical Point of View: 9 Logico-

Philisophical Essays. Harvard University Press, 2 edition, 1961. Cam-
bridge, Massachusetts.
[72] W. Van Orman Quine. Set Theory and its Logic. Harvard University
Press, 1963. Cambridge, Massachusetts.
[73] R.C.W. Bertrand Russell entry in Encyclopedia Brittanica. EB, 2000.
[74] J. Richard. Les principes de mathématiques et le problème des ensem-

bles. Revue gnrale des sciences pures et appliques, 16, 1905. Published
also in Acta Mathematica 30 (1906), pages 295-296.
[75] B. Riemann. Uber die Hypothesen, welche der Geometrie zu grunde

liegen. Göttingen Press, 1854.
[76] N. Rose. Mathematical Maxims and Minims. Raleigh NC, 1988.
[77] H. Rubin and J.E. Rubin. Equivalents of the axiom of choice. North-
Holland Press, Amsterdam, 1963.
[78] B. Russell. My philosophical development. London: George Allen and

Unwin, New York: Simon and Schuster, 1959.
BIBLIOGRAPHY 193
[79] B. Russell. Introduction to Mathematical Philosophy. The Great

Philosophers. London: George Allen and Unwin; New York: The
Macmillan Company, 1999. First published in 1997 by Phoenix.
[80] B. Russell. The autobiography of Bertrand Russell. Routledge, 2000.
[81] S. Shelah. Proper forcing, lecture notes in mathematics. Springer-
Verlag, 1982.
[82] M. Sipser. Introduction to the theory of computation. PWS Publishing
Company, Boston, 1997.
[83] A.T. Skolem. Einige bemerkungen zur axiomatischen begründung der
mengenlehre. Akademiska Bokhandeln, Helsinki, 1922. In ‘Matem-
atikerkongressen i Helsingfors 4-7 juli 1922, Den femte skandinaviska
matematikerkongressen’, pages. 217-232. Reprinted in ‘Selected Works
in Logic’, by A.T. Skolem, edited by Jens E. Fenstad, 1970, Publisher
Universitetsforlaget, Oslo.
[84] R.M. Smullyan. Gödel’s incompleteness theorems. Oxford Logic
Guides. Oxford University Press, 1992.
[85] B. Sobocinski. L’analyse de l’antinomie Russellienne par Lesniewski.
Unknown, 1950. Methodus I, pages 94-107, 220-228, 308-316; Metho-
dus II, pages 237-257.
[86] F. Kamareddine T. Laan and R. Nederpelt”. Types in Logic and Ma-
thematics before 1940, volume 8. Bulletin of Symbolic Logic, January
2002. To be published.
[87] M. Tiles. Mathematics and the image of reason. Routledge, 1991.
[88] E.C. Titchmarsh. Mathematical Maxims and Minims. Rome Press,
1988.
[89] A.M. Turing. On computable numbers, with an application to the Ent-
scheidungsproblem, volume 42, pages 230-265 of 2. London Mathe-
matical Society, 1936. With corrections from Proceedings of the Lon-
don Mathematical Society, Series 2, Vol.43 (1937) pages 544 to 546.
Reprinted with some annotations in ‘The Undecidable: Basic Papers
on Undecidable Propositions, Unsolvable Problems and Computable
Functions’, ed. Martin Davis, 1965, Raven Press, New York.
194 BIBLIOGRAPHY
[90] A.M. Turing. Intelligent Machinery. National Physical Labatory,

1948. National Physical Labatory Report in ‘Machine Intelligence 5’
by Meltzer, B. and Michie, P., 1969, Edinburgh University Press.
[91] Unknown. Encyclopedia Brittanica; Item on Principia Mathematica.

EB, 2000.
[92] Unknown. Encyclopedia Brittanica; Item on Turing. EB, 2000.
[93] J. van Heijenoort. From Frege to Gödel: source book in mathematical

logic 1879-1931. Harvard University Press, 1967.
[94] W. van Orman Quine. New foundations for Mathematical Logic. The
American Monthly, February 1937. 44(2), pages 70-80.
[95] Various. The Mathematical Intelligencer, volume 13. Springer-Verlag,

Berlin, 1991.
[96] J. von Neumann. Zur Einfurung der transfiniten Zahlen. Acta Szeged.
1:199-208 [I, 3], 1923.
[97] J. Weiner. Frege in Perspective. Cornell, 1990.
[98] J. Weiner. Frege. Past Masters. Oxford University Press, 1999.
[99] A. Whitehead. An introduction to Mathematics. Williams and Norgate,

London, 1911.
[100] A. Whitehead. A treatise on universal algebra. New York, 1960.
[101] E. Zermelo. Untersuchungen über die Grundlagen der Mengenlehre,

I. Springer-Verlag, 1908. In Mathematische Annalen 65, 1908, pages
261-281.

The Theory of The Foundations of Mathematics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Theory of The Foundations of Mathematics

Uploaded by

Copyright:

Available Formats

The theory of the foundations of mathematics

Mark Scheﬀer, id. 415968, e-mail: zax@chello.nl. Last changes:

To work on the foundations of mathematics, two things are needed:

- Anonymous quote, 2001.

3 Mathematical constructs in set-theory 21

4 Peano and Frege 71

9 Church and Turing 141

A Timeline and Images 181

symbol meaning also described as

In most places we have chosen to use the following notation1 to denote

(relation : range : term)

This notation is suitable for formal manipulation and unambiguous in the

By the middle of the nineteenth century, certain logical problems (for

Ramsey, see section 7.2).

At the turn of the century, the German mathematician David Hilbert

This article is part of the practical component of my study of computing

2.1 The beginning of set-theory

In Germany at the university of Halle, the direction of Cantor’s research

2.2 Basic concepts

- Georg Cantor, quoted in [58]

What is a set? A (ﬁnite or inﬁnite) collection of objects, that is considered

We denote a set of elements between brackets ‘{’, ’}’, and membership of

Example: If we consider a set that contains natural numbers, we write 4 ∈

In a mathematical context we mostly consider sets of numbers and functions.

Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B))

Example: {3, 3, 7} = {7, 3} and {2, {3, 4}}

Deﬁnition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B)

Deﬁnition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A

Comprehension principle: For all properties ϕ there is precisely one set,

We thus have that y ∈ {x | ϕ(x)} ↔ ϕ(y). As a consequence (by taking

Theorem: (∃!x :: (∀y :: y ∈ / x))

The intersection operation is denoted by the symbol ∩. A ∩ B is deﬁned

The diﬀerence of sets B and A, denoted B − A, contains those elements

If A ⊆ B we often call the diﬀerence B − A the relative complement of A

First law of reciprocity: A ⊆ B ↔ AC ⊇ B C

Deﬁnition of powerset: P(V ) := {A | A ⊆ V }

Given a set V , we thus have that (∀y :: y ∈ P(V ) ↔ y ⊆ V )

We can divide a set of objects into a partition, that is a family of subsets

3.1 Some mathematical concepts

- J.W.N. Sullivan in Aspects of Science, 1925

First we consider the mathematical concept of an ordered pair < a, b >.

that the elements appear in the same order:

(∀c, d :: < a, b > = < c, d > ↔ a = c ∧ b = d)

Deﬁnition of ordered pair1 : < a, b > := {a, {a, b}}

As the cartesian product A × B is by deﬁnition the set of all ordered

Deﬁnition of cartesian product: A × B := {< a, b > | a ∈ A ∧ b ∈ B}

Let V = {Vi | i ∈ I} be a set of sets. We now deﬁne the cartesian product

In mathematics, a relation maps each element from an input set (called

Deﬁnition of binary relation:

Note: We can easily generalize this deﬁnition for n-ary relations: R is an

We deﬁne the following shorthand notation (sometimes also written in

The mathematical expression ‘x < y’ is now equivalent to the set theoretic

< 0, 1 >, < 1, 2 >, < 2, 3 >, . . .

On a relation R we can deﬁne the concepts of domain and range.

Deﬁnition of domain, range:

If we deﬁne the identity relation of X, we want it to have the usual pro-

Deﬁnition of identity relation: IV := {< x, y > ∈ V × V | x = y}

Assume R is a binary relation on a set X (i.e. R ⊆ X × X). As we did

If R is an equivalence relation on a set X, we denote the equivalence class

Deﬁnition of equivalence class: [x]R := {y ∈ X | R(x, y)}

If R is an equivalence relation on X, the quotient set X/R of X modulo

Deﬁnition of quotient set: X/R := {[x]R | x ∈ X}

When R is a partial ordering we often denote it by the symbol , and

Deﬁnition of (relational) structure: X, R0 , . . . , Rp is a (relational)

Let R = X, R0 , . . . , Rp and S = Y, S0 , . . . , Sp be two structures, such

Example: An isomorphism from structure N, < to Neven , < is given by

Example: An automorphism of A, R0 , . . . , Rp is the identity function idA :

• if n and m are both even, then n m if n < m

• if n and m are both odd, then n m if n < m

• if n is even and m is odd, we always deﬁne n m

We can check that N is well-founded by , but not every element (for