Professional Documents
Culture Documents
- 1870 to 1940 -
Mark Scheffer
(Version 1.0)
2
3
1 Introduction 9
2 Cantor’s paradise 13
2.1 The beginning of set-theory . . . . . . . . . . . . . . . . . . . 13
2.2 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5
6 CONTENTS
5 Russell 79
5.1 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Consequences and philosophies . . . . . . . . . . . . . . . . . 88
5.3 Zermelo Fraenkel . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . 92
5.3.2 Zermelo Fraenkel (ZF) Axioms . . . . . . . . . . . . . 93
6 Hilbert 99
6.1 Hilbert’s proof theory . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . . . . . . . 110
7 Types 113
7.1 Russell and Whitehead’s Principia Mathematica . . . . . . . . 113
7.2 Ramsey, Hilbert and Ackermann . . . . . . . . . . . . . . . . . 119
7.3 Quine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8 Gödel 123
8.1 Informally: Gödel’s incompleteness theorems . . . . . . . . . . 123
8.2 Formally: Gödel’s Incompleteness Theorems . . . . . . . . . . 127
8.2.1 On formally undecidable propositions . . . . . . . . . . 127
8.2.2 The impossibility of an ‘internal’ proof of consistency . 130
8.2.3 Gödel numbering and a concrete proof of G1 , G2 and G3 131
8.3 Gödel’s theorem and Peano Arithmetic . . . . . . . . . . . . . 132
8.4 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5 Neumann-Bernays-Gödel axioms . . . . . . . . . . . . . . . . . 135
10 Conclusion 169
Mathematical Notations
Many different notations have been developed for set theory and logic.
Most notations that we have used are standard today; other notations that
we have used are introduced in the text.
Mathematical Logic
∧ conjuction and
∨ disjunction (inclusive) or
¬ negation not
ϕ(x) propositional function
→ implication if . . . then
↔ bi-implication if and only if, iff
≡ equivalence is equivalent to
∀ universal quantifier for all
∃ existential quantifier exists
∃! one-element existential quantifier exists a unique
1
Notation originally due to E.W. Dijkstra.
8 CONTENTS
Example:
( x : 0 ≤ x ≤ 5 : x2 )
=
02 + 12 + 22 + 32 + 42 + 52
=
5
x2
x=0
Example:
(∃x : x ∈ N : x3 − x2 = 18)
≡
‘there exists a natural number x such that x3 − x2 = 18’
If the term ranges over all possible values of the variable (here : x), or if
it is clear what the range of a variable is, we can omit it.
Example:
(∀x : true : x ∈ A → x ∈ B)
≡
(∀x :: x ∈ A → x ∈ B)
≡
‘all elements of A are also elements of B’
Chapter 1
Introduction
Pure mathematics is, in its way, the poetry of logical ideas.
- Albert Einstein
This report covers the most important developments and theory of the
foundations of mathematics in the period of 1870 to 1940. The tale of the
foundations is fairly familiar in general terms and for its philosophical con-
tent; here the main emphasis is laid on the mathematical theory. The history
of the foundations of mathematics is complicated and is a many-sided story;
with this article I do not aim to give a definitive or complete version, but
to capture what I consider the essence of the theoretical developments, and
to present them in a clear and modern setting. Some basic mathematical
knowledge on set-theory and logics are presupposed.
This is where the theory of this report begins, with the emergence of set
theory by the German mathematician Cantor. In section 2.1 we informally
describe how work on a problem concerning trigonometric series gradually
led Cantor to his theory of sets (section 2.2). As a result of the work of
Weierstrass, Dedekind and Cantor, pure mathematics had been provided
with much more sophisticated foundations. The notion of infinitesimal had
been banished, ‘real’ numbers had been provided with a logically consistent
9
10 CHAPTER 1. INTRODUCTION
definition (section 3.5), continuity had been redefined and, more controver-
sially, a whole new branch of arithmetic had been invented which addressed
itself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7).
In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish
but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly
different form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti could
not resolve this paradox, but it was not taken so seriously, partly because
the paradoxes appeared in a rather technical region.
The Italian mathematician Peano (section 4.1) was able to show that the
whole of arithmetic could be founded upon a system that uses three basic
notions and five initial axioms. At the same time the German mathematician
Frege (section 4.2) worked on developing a logical basis for mathematics. Just
as Peano, Frege wanted to put mathematics on firm grounds. But Frege’s
grounds were strictly logic; he followed a development later called logicism,
also known as the development of so-called mathematical logic.
The British mathematician Russell noted Peano’s work and later that
of Frege. Soon thereafter he showed (section 5.1) how finite descriptions
like ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointed
out the difficulties that arose with self-referential terms. This paradox that
Russell found existed not only in specific technical regions but in all of the
axiomatic systems underlying mathematics at the same time (section 5.1).
But since the paradoxes could be avoided in most practical applications of
set theory, the belief in set theory as a proper foundation of mathematics
remained. Axiomatic set theory (section 5.3.1) was an attempt to come to
a theory without paradoxes. Various responses to the paradox (section 5.2)
led to new sets of axioms for set theory. The two main approaches are by the
German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun-
garian von Neumann, the Hungarian-Austrian Gödel and the Briton Bernays
(section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of
mathematics by the Dutch mathematician Brouwer (not covered here) and
to a theory of types, proposed by Russell himself with the help of his for-
mer teacher, the English mathematician Whitehead. Despite of the paradox
Russell and Whitehead still claimed that all mathematics could be founded
on a mathematical logic; this believe was given a definite presentation in
their work ‘Principia Mathematica’ (section 7.1). Various consequences fol-
lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and
11
In 1931 Gödel had shown that consistency and completeness could not
both be attained (chapter 8). Gödel’s work left outstanding Hilbert’s ques-
tion of decidability. The English mathematician Turing proved in 1936 that
there are undecidable problems, by giving the so-called halting problem that
cannot be solved by any algorithm (section 9.1), after formalizing the no-
tion of algorithm with his concept of the Turing Machine. The American
mathematician Church (independently) obtained the same result but with
another formalization of the notion of an algorithm, using his computational
model of lambda calculus (section 9.2). In section 9.3 we state that these two
notions are equivalent and correspond to the intuitive notion of algorithm or
computability. In chapter 10 I summarize the theory of the foundations of
mathematics, before giving my own opinion and make some suggestions for
future work.
Cantor’s paradise
- Titchmarsh, E. C. in [88]
By the late 19th century the discussions about the foundations of geometry
had become the focus for a running debate about the nature of the branches
of mathematics ([23, last paragraph of section 35, page 69/70]). Although
there had been no conscious plan leading in that direction, the stage was set
for a consideration of questions about the fundamental nature of mathema-
tics.
In the study of logic, the work of the English mathematician George Boole
in the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders
13
14 CHAPTER 2. CANTOR’S PARADISE
Peirce around 1880 ([49, page 187]), had contributed to the development of a
symbolism to explore logical deductions and in Germany the logician Gottlob
Frege (see [98]) had directed keen attention to fundamental questions.
All of these debates came together through the pioneering work of the
German mathematician Georg Cantor on the concept of a set. Cantor had
begun work in this area because of his interest in Riemann’s theory of trigono-
metric series.
In 1974 Cantor published his first article on set-theory. A set, wrote Can-
tor (in ‘Untersuchungen über die Grundlagen der Mengenlehre I’, published
in [20, page 261-281]), is “a collection of definite, distinguishable objects of
perception or thought conceived as a whole”. In this report we use a similar
description of the concept of a set.
A set is sometimes also called aggregate, class or (as it was first called by
Riemann (see [31, page 88]) and later by the mathematician Russell:) mani-
fold . The objects are also called elements or members of the set.
What is set theory? A branch of mathematics that deals with the proper-
ties of well-defined collections of objects, which may be of a mathematical
nature, such as numbers or functions, or not.
16 CHAPTER 2. CANTOR’S PARADISE
Cantor defined ([49, page 288]) two sets A and B to be identical (equal),
notation A = B, if and only if A and B have the same elements. When later
set-theory was axiomatized, this definition became also known as the
The relation ‘is a subset of’, notation ⊆, indicates that one set is con-
tained in the other:
We often want to create a new set from a given set by selecting elements
that have certain properties. For example we take the set of powers of three
or the set of all even numbers (to be exact: the set containing those ele-
ments of the set of natural numbers that have the property to be divisible
by 2). This principle was used by Cantor, and we also call it the unrestricted
or naive comprehension principle because it later (see sections 3.8 and 5.1)
turned out to be untenable.
Corollary: (∀a :: ∅ ⊆ a)
Proof: We want to prove that (∀a :: ∅ ⊆ a) or, using the definition of the
subset relation: (∀x :: x ∈ ∅ → x ∈ a). From the previous theorem we know
that (∀y :: y ∈
/ ∅). This yields us (∀x :: false → x ∈ a), which is true.
Using the comprehension principle we can create new sets from given sets.
So now we can introduce some operations on sets, by applying the compre-
hension principle. But before we do that, we first introduce some general
(i.e. regardless whether the operations are set-theoretic or not) properties
of operations: idempotence, commutativity, associativity and distributivity.
Although Cantor did not formulate these properties as such, they are used
in the branch of calculus and useful in the set theory that follows in this
chapter.
Suppose ⊕ and are binary1 operations on a certain domain and E, F and
G are elements on that domain (for example sets), on which we have defined
the equality relation ‘=’.
Definition of idempotence:
⊕ is idempotent := (∀E :: E ⊕ E = E)
Definition of commutativity:
⊕ is commutative := (∀E, F :: E ⊕ F = F ⊕ E)
Definition of associativity:
⊕ is associative := (∀E, F, G :: (E ⊕ F ) ⊕ G = E ⊕ (F ⊕ G))
Definition of distributivity:
⊕ is distributive2 over := (∀E, F, G :: E ⊕ (F G) = (E ⊕ F ) (E ⊕ G))
1
These properties can also be generated for operations of arbitrary arity, but this will
not be necessary for our discussion.
2
This form of distributivity is also called left-distributivity, as opposed to right-
distributivity.
⊕ is right-distributive over := (∀E, F, G :: (E F ) ⊕ G = (E ⊕ G) (F ⊕ G))
In ordinary mathematics this distinction is often left out for commutative operations, and
we for example simply say that × is distributive over + (when in fact it is both left- and
right-distributive).
18 CHAPTER 2. CANTOR’S PARADISE
The symbol ∪ is employed to denote the union of two sets. Thus, the set
A ∪ B is defined as the set that consists of all elements belonging either to
set A or set B.
Definition of union: A ∪ B := {x | x ∈ A ∨ x ∈ B}
Definition of intersection: A ∩ B := {x | x ∈ A ∧ x ∈ B}
Any two sets the intersection of which is the empty set are said to be dis-
joint. A collection of sets is called (pairwise) disjoint or mutually exclusive
if any two distinct sets in it are disjoint.
Example: The operations union and intersection on sets are both idempo-
tent, commutative and associative.
Definition of difference: B − A := {x | x ∈ B ∧ x ∈
/ A}
We define the power set of V , denoted by P(V ), as the set of all subsets
of V . Note that if V
= ∅, this operation creates a larger set from a given set
V.
2.2. BASIC CONCEPTS 19
We can extend the union of a pair of sets to any finite collection of sets;
the union is then defined as the set of all objects which belong to at least
one set in the collection A. We can do the same for the intersection.
Definition: A := {x | (∃y :: y ∈ A ∧ x ∈ y)}
Definition: A := {x | (∀y :: y ∈ A → x ∈ y)}
Definition
of partition: P is a partition of X :=
X = {A | A ∈ P } ∧ (∀A, B : A, B ∈ P : A = B ∨ A ∩ B = ∅)
In this chapter I have made extensive use of [30] in section 2.1 and [17]
in section 2.2.
20 CHAPTER 2. CANTOR’S PARADISE
Chapter 3
Mathematical constructs in
set-theory
Now that we have this apparatus of set-theory available, we will see that
it is not just a separate branch of mathematics, but that we can define some
basic mathematical constructs in set-theory. In this section we will consider
pairs and the cartesian product, necessary before we can treat relations (in
section 3.2) and functions (in section 3.3).
21
22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
We can now easily verify that the following definition (see [17, chapter
8]) in set-theory satisfies the desired property.
Definition of cartesian
product of a set of sets:
×V := {f : I → i∈I Vi | (∀i : i ∈ I : f (i) ∈ Vi )}
1
Representation originally by Kuratowski, see [49, page 294].
3.2. RELATIONS 23
3.2 Relations
Mathematicians do not study objects, but relations between ob-
jects. Thus, they are free to replace some objects by others as
long as the relations remain unchanged. Content to them is irre-
levant: they are interested in form only.
- J.H. Poincaré
Example: We have already seen the definitions of the subset and proper sub-
set relations in section 2.1. There we defined the set R ⊆ X ×Y implicitly by
using a statement; only those pairs < x, y > are in R for which the statement
holds (here we are using in fact the comprehension principle of page 16). We
will continue to use statements to define relations.
Example: The relation < on the naturals (i.e. between N and N) can be
defined as:
Definition of reflexivity:
R is reflexive := (∀x : x ∈ X : R(x, x))
Definition of symmetry:
R is symmetric := (∀x, y : x, y ∈ X : R(x, y) → R(y, x))
Definition of anti-symmetry:
R is anti-symmetric := (∀x, y : x, y ∈ X : R(x, y) ∧ R(y, x) → x = y)
Definition of transitivity:
R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧ R(y, z) → R(x, z))
Definition of connectivity:
R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨ R(y, x))
Definition of equivalence:
R is an equivalence relation := R is reflexive, symmetric and transitive
3.2. RELATIONS 25
Note: Asymmetric means not symmetric, and is not the same as anti-
symmetric.
Example: The subset relation is reflexive, anti-symmetric (note that the proof
of anti-symmetry uses the axiom of extensionality of page 16) and transitive,
but not connective.
Now we can speak of a set of which the elements are ordered by a relation
R, we define the well-known concepts of (immediate) successor and prede-
cessor.
Note that with these definitions it can be easily proved that if a relation
R on X is an ordering, then each element except the smallest has a unique
immediate predecessor and each element except the largest has a unique
immediate successor. The notions of smallest and largest elements will be
introduced hereafter. In the literature the immediate successor or predeces-
sor is sometimes called just successor or predecessor. Sometimes we also see
that the term ‘direct’ is used in stead of ‘immediate’, or we simply speak of
the ‘next’ or ‘previous’ value.
Definition of lowerbound:
x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x y)
Definition of upperbound:
x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y x)
Definition of infimum:
x is an infimum for Y in X := x is the greatest lowerbound for Y in X
Definition of supremum:
x is a supremum for Y in X := x is the smallest upperbound for Y in X
Example: Let X = {4, 6, 12, 24, 36} and R(x, y) := x is a divisor of y. Then
R is a partial order (but not strict) and also a quasi order, but not a (total)
order. 4 and 6 are minimal elements of X, but X has no least element. 1 is
a lowerbound for X, and 2 is the infimum of X.
28 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
The so-called least number principle says that any non-empty subset of
the natural numbers has a least element. This principle can be shown (a
proof can be found in [59, page 7]) to be equivalent to the principles of weak
and strong induction, that will be introduced in section 3.4.
3.3 Functions
In mathematics, a function maps each element from an input set to one or
more elements of an output set; in other words it is a special kind of relation
that indicates for each pair < x, y > of the input and output set if it belongs
to the function or not. More precisely, f is a function or mapping from X
to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y , no-
tation f (x) = y. We can define this notion in set-theory by using a relation
between X and Y such that for each x ∈ X there is a unique y ∈ Y such
that < x, y > ∈ f .
As we did before for relations and operations, we now define some general
properties for functions.
Definition of bijective:
f : X → Y is bijective or a bijection := f is surjective and f is injective
We will now consider two special kinds of functions: the identity function
and the sequence.
Definition of sequence:
s is a sequence of X := s is a function from N to X (i.e. s ∈ X N )
Just as in algebra, we can now combine a set and relations on that set
into a structure.
The concept of a structure enables us to abstract from the exact set and
relations, and reason about sets of structures instead. There also is a useful
definition for equivalence of structures, called isomorphism.
3.3. FUNCTIONS 31
Definition of automorphism:
f is an automorphism of R := f is an isomorphism from R to R
3.4.1 Induction
Induction is a method of reasoning from a part to a whole, from particu-
lars to generals, or from the individual to the universal. It should not be
confused with the mathematical principle of induction (treated in section
3.4.3). In ordinary induction we examine a certain number of cases and
then generalize. Reasoning by analogy, where a conclusion is made based on
an analogues situation, is also a primitive form of induction (see [23, page 6]).
3.4.2 Deduction
Mathematics, in its widest significance, is the development of all
types of formal, necessary, deductive reasoning.
The Greek found deductive reasoning, not empirical procedures, the method
to establish mathematical facts. This usage is a generalization of what the
Greek philosopher Aristotle called the syllogism (see [49, chapter 1, section
5 and 6)]), but a syllogism is now recognized as merely a special case of a
deduction. Also, the traditional view that deduction proceeds from the gene-
ral to the specific has been abandoned as incorrect by most logicians. Some
experts regard all valid inferences as deductive in form and for this and other
reasons reject the supposed contrast between deduction and induction. The
German mathematician Hilbert greatly contributed to deductive reasoning as
we will see when we introduce his proof theory (also known as the axiomatic
method) in chapter 6. Logic, in mathematical context, can be seen as the
theory of the formal structure of deductive reasoning. The logic of Hilbert’s
metamathematics (see section 6.1) and Russell’s Principia Mathematica (see
section 7.1) are a form of reasoning with deductive certainty, although others
have proposed different formalizations of deductive logic (see [49, page 121]).
Originally based on Aristotle’s logic, the deductive argument has become
more subtle and complex and is now based on modern symbolic logic.
34 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
1) Basis. Prove that the theorem holds for a specific case (which often is
minimal for a given ordering of the elements). This case is also called
base case.
2) Induction step. Prove a rule that says that if the theorem holds for an
arbitrary element, it is true for the next case. This often is a rule of
heredity that tells us that the theory is true for the immediate successor
case of an arbitrary element if it is true for the arbitrary element itself.
The claim that the theorem is true for an arbitrary element is called
the induction hypothesis.
3) Conclusion. Together, 1 and 2 imply that the theorem holds for all
cases starting with the base case. If you didn’t use the minimal case in
step 1, then you have proven only that the theorem holds for that case
and its successors, not for all possible cases.
The induction step can take two forms which correspond to two forms of
mathematical induction. Again we assume there is an ordering of the ele-
ments with +1 the immediate successor relation.
Weak: prove that if the theorem holds for an arbitrary element n, then it
holds for the element n + 1
Strong: prove that if the theorem holds for all elements up to some arbitrary
element n, then it holds for the element n + 1
3.4. INDUCTION METHODS 35
Formal
Suppose that we want to prove a property ϕ(s) that holds for all s ∈
S. The induction principle assumes that S is a well-founded set and every
element except for the smallest has an immediate predecessor. This condition
is also known as S is inductive. The structure of an inductive set in fact
resembles that of the naturals, i.e. if we have the axioms (see Peano axioms
in section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N is
inductive. In case the set S is the naturals, we also refer to the principle as
natural induction.
The principle presupposes the following two conditions:
Step Clis also called the base of a proof by induction, step Dlis also
called the induction step, and ϕ(s) is called the induction hypothesis.
Proof: Suppose S is a well-founded set and every element except the small-
est, denoted e, has an immediate predecessor, and suppose that a property
ϕ is true for e, as well as for the immediate successor s+ ∈ S if it is true for
s ∈ S. We now prove by contradiction that ϕ holds for all s ∈ S. Suppose
that ϕ is not true for all s ∈ S. Let N be the set of elements of S for which
ϕ is not true, i.e. N = {s ∈ S | ¬ϕ(s)}. By the theorem of page 26 we also
know that if S is well-founded, any subset of S is also well-founded, thus N
contains a smallest element n. If n = e, we have a contradiction. If n > e, n
has an immediate predecessor, denoted n−. Since n is the smallest element
for which ϕ doesn’t hold, ϕ must hold for n−. But then by Dl, ϕ must also
hold for the immediate successor of n−, that is n: contradiction. Thus ϕ
must be true for all s ∈ S.
Therefore (with only conditions Al, Cland Dlholding) every natural num-
ber is even: contradiction!
Sometimes this is also informally stated using the infamous three dots as
(∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . . . ∧ ϕ(s)) → ϕ(s+).
Proof: Suppose X, R is a structure such that Al, Bland Elhold. Again
we use proof by contradiction, and assume (∃x : x ∈ X : ¬ϕ(x)). Thus
{x ∈ X | ¬ϕ(x)} is non-empty and has a smallest element e (since X, R
is well-founded). We now have ¬ϕ(e ) ∧ (∀z : z ∈ X : R(z, e ) → ϕ(z)).
According to El(substitute z for y, X for S, and take e for x) we then have
ϕ(e ): contradiction.
Note that the base case is not really left out, since it is implicitly present
in the quantification (take e for x). This form of induction, when applied
to ordinals (ordinals form a well-ordered and hence well-founded set and are
introduced in section 3.8.2) is called transfinite induction.
3
Sometimes this principle is called the Principle of Complete Induction, for example in
[4], but this is less common.
38 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
An example of such a set are the ordinals or cardinals, or even the class
of all ordinals. A proof by transfinite induction typically needs to distinguish
three cases:
1. s is a minimal element
Clearly, all three given principle are equivalent, since we proved them to
be true. These proofs however are based on an underlying set of axioms (the
so-called ZF axioms and the Peano axioms, that will be introduced in section
5.3 and chapter 4 respectively). Without these conditions (to be exact, with-
out Peano’s induction axiom), we cannot directly prove the principles to be
true from the ZF axioms alone4 . In that case we can prove the equivalence
of the principles by showing that they imply each other. As an example,
we now prove that (mathematical) induction is a special case of transfinite
induction, for the set of natural numbers. To prove this it suffices to show
that ( Cland Dl) ↔ El .
4
With only the fundamental axioms of Zermelo-Fraenkel set theory, it is not possible to
prove mathematical induction. An extra axiom is needed, the infamous Axiom of Choice,
or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’s
Lemma’, ‘Well-Ordering principle’ (also known as well-ordering theorem, see page 3.8.2)
and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume one
of them to be true, the others follow as consequences, but none of them can be proven
from the other fundamental axioms in ZF set theory alone. There are also other equivalent
statements that are sometimes used (such as Zermelo’s postulate), and it is a nice exercise
to prove the equivalence of these statements.
3.4. INDUCTION METHODS 39
We can prove the equivalence of IND and TFIND in two ways: in a con-
structive way or with a proof by contradiction. We give both proofs.
Constructive Proof:
Structural Induction
other object as a subpart, but this need not always be the case.
We call the left-hand side of this equality LHS, and the right-hand side
RHS, and abbreviate the equality by EQ. We assume two real numbers x
and y and prove EQ by induction on n.
Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. For
some reason, most textbooks take n = 1 as the basis, in which case LHS is
simply x + y, and RHS is
1 1−0 0 1
x y + x1−1 y 1 = x + y
0 1
Induction case: We assume EQ is true for n = k and have to show that it is
then also true for n = k + 1 :
k+1
k+1 k+1
(x + y) = xk+1−j y j
j=0
j
k k
k k−j+1 j k
x y + xk−j y j+1
j=0
j j=0
j
k
k+1 k+1 k k
x + y + + xk+1−j y j =
j=1
j j − 1
k k
k+1 k+1 k k+1−j j k
x + y + x y + xk+1−j y j
j=1
j j=1
j − 1
We can now bring xk+1 into the first sum (as the j = 0 term), and y k+1
into the second sum (as the j = k + 1 term). This gives
k k+1
k k+1−j j k
RHS = x y + xk+1−j y j
j=0
j j=1
j − 1
3.4. INDUCTION METHODS 43
and
k k
k k−j+1 j k
LHS = x y + xk−j y j+1
j=0
j j=0
j
The first sums of LHS and RHS are the same, and we can see that the
second sums are also equal, by doing a dummy transformation (let i = j −1):
k+1
k
k k+1−j j k
x y = xk−i y i+1
j=1
j − 1 i=0
i
So LHS = RHS, and we can conclude that EQ holds for all x, y ∈ R and
n ∈ N.
size : TREE → N
∀ t1 , t2 : TREE •
size(leaf) = 1 ∧
size(node(t1 , t2 )) = 1 + size(t1 ) + size(t2 )
leaves: TREE → N
nodes: TREE → N
∀ t1 , t2 : TREE •
leaves(leaf) = 1 ∧
leaves(node(t1 ,t2 )) = leaves(t1 ) + leaves(t2 ) ∧
nodes(leaf) = 0 ∧
nodes(node(t1 ,t2 )) = 1 + nodes(t1 ) + nodes(t2 )
People have been using the concept of real numbers for a long time (the
Babylonians for example already calculated with roots long B.C., see [12]).
In order for set theory to cover the fundamental structures of analysis, a
precise and formal basis for the real numbers was needed. Even simple equa-
tions have no solutions if all we knew were rational numbers (for example,
there is no rational number x such that x2 = x ∗ x = 2).
When Cantor developed his set theory, it was well known that each type of
number could be constructed as the limit of a sequence of numbers of another
type. But it became clear that, especially in connection with theorems as-
serting the existence of some limit relations, (see [30, page 182]) the proof
might require irrational numbers to be defined in terms of rational ones, in
order to avoid begging the question of existence involved in the theorem.
Cauchy and Heine tried to define the irrational or real numbers in the second
half of the 19th century. In 1872 Cantor and Dedekind followed with their
precise definition of the real numbers. We first present the three methods
(of Dedekind, Cantor and Cauchy) of defining the reals in terms of rationals
and then show that they are identifiable.
46 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
2) (∀a, b : a, b ∈ C : a ∈ C ∧ b < a → b ∈ C)
We can now define RDedekind as the set of all equivalence classes of all cuts
in Q: RDedekind := {C ⊆ Q | C is a cut in Q }/∼.
5
Actually, Dedekind’s original definition did not use a partition but a slightly more
complex division. For details see the link ‘Dedekind cuts’ at http://zax.mine.nu/stage.
3.5. REAL NUMBERS 47
√
Example:√ {x ∈ Q | x2 < 2} has 2 as supremum. We can√identify the real
number 2 with the equivalence class of all sets that have 2 as supremum.
1) (∀n : n ∈ N : an ∈ V ∧ bn ∈ V )
3) (∀n : n ∈ N : bn − an ≤ 2−n )
Note that < an , bn >Vn∈N (notation < an , bn >V or < an , bn > when it
is clear which set V is meant) is actually a sequence, and in 3) a minimum
bound is put on the speed of convergence. We now want to be able to say
when two chains are equivalent.
We can now define RCantor as the set of all equivalence classes of chains
of segments in Q : RCantor :=< an , bn >Q
n∈N / ∼
3.5.3 Cauchy-sequences
Men pass away, but their deeds abide.
1) (∀n : n ∈ N : an ∈ V )
6
V is in general an ordered, commutative ring. We will not further discuss this here,
and for the rest of this paragraph take V = Q.
3.5. REAL NUMBERS 49
Theorem: Any convergent sequence {an }n∈N is bounded and has a unique
limit.
Proof: First we prove (by contradiction) the uniqueness. Suppose the se-
quence has 2 limits, c and c . Take any k ∈ N. Then from the definition of
convergence there is an integer p such that | an −c | < 2−k if n > p. Also, there
is an integer p such that | an − c | < 2−k , if n > p . Adding the two equa-
tions we get (using the triangle inequality: (∀a, b :: | a + b | ≤ | a | + | b | ))
: | c − c | = | (an − c) + (c − an ) | ≤ | an − c | + | an − c | < 2−k ∗ 2.
Hence, | c − c | < 2 ∗ 2−k , for all k ∈ N, if n > p ∧ n > p . This means c = c ,
thus the limit is indeed unique. Now we prove boundedness. The sequence
converges, so we can take, for example, k = l. Then there is a p such that
| aj − c | < 2−k for j > p. We then have, again using the triangle inequality,
that | aj | ≤ | aj − c | + | c | < 2−l + | c |. Then the sequence can be
bounded by M = max.{| a1 |, | a2 |, . . . , | ap |, (1 + | c |)}
We can now define RCauchy as the set of all equivalence classes of Cauchy
sequences in Q : RCauchy := < an >Q
n∈N / ∼
Then we can check for every newly defined set X of reals that:
Cantor used the Hebrew letter aleph to name the different levels of in-
finity. The cardinality of the set of natural numbers is by definition called
aleph-null or aleph-nough, notation ℵ0 . The ‘next levels’ of infinity are called
ℵ1 , ℵ2 , . . .. Since the cardinality of the set of reals was unknown, Cantor de-
fined it as c. If we assume the continuum hypothesis (see section 3.7), that
says there is no level of infinity between the cardinality of N and R, the car-
dinality of the set of reals can also be denoted by aleph-one, notation ℵ1 .
In the rest of this section we will present some of the results of the research
of infinite sets.
i = 0: if 0
= j then g(0) = a ∈
/ W and g(j) ∈ W , so g(0)
= g(j).
i = k + 1 : assume k + 1
= j, then we can prove g(k + 1)
= g(j) by
induction on j:
j = 0 : g(0) = a ∈
/ W and g(k + 1) ∈ W , so g(k + 1)
= g(0).
j = l + 1: we know k = 1
= j = l + 1, so k
= l. By the induction
hypotheses g(k)
= g(l). Since f is a bijection we also have that
f (g(k))
= f (g(l)), i.e g(k + 1)
= g(l + 1) or g(i)
= g(j).
54 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Definition of countable:
A set V is countable, also called denumerable := V is finite or V ∼ N
Cantor then proved that N, Z and Q all have the same cardinality and
also called these sets countably infinite.
Theorem: Q is countable
Proof: We give a bijection from N to Q, by listing all elements of Q. Consider
a table with all fractionals ab (a ∈ N, b ∈ N+ , with fractional ab on the ath
row and the bth column. If we list all elements row by row, we would not
obtain a correspondence between N and Q, since the list would never get
to the second row. By listing the elements at the diagonals (south-west to
north-east), starting from the north-west corner, we obtain a correspondence
between N and Q. Because 22 = 11 , etc, we hereby skip an element when it
would cause a repetition. We can also give a bijection from Q to an infinite
subset of N which is equivalent to N: for each fractional ab ∈ Q with a and b
relative prime, let f (< a, b >) := 12 (a + b)(a + b + 1) + n.
Theorem: R is uncountable
Proof: Suppose there is a bijection f between N and R. We contradict this
by finding an x in R that is not paired with anything in N. We construct
this X by taking the first fractional digit of x arbitrarily but never 0 or 9 or
the first fractional digit of f (1), the second fractional digit of x also different
from 0, 9, and the second fractional digit of f (2), etc. Continuing this way
3.6. INFINITE SETS 55
down the diagonal of the table of digits, we obtain all digits of x. x is not
f (n) for any n because the nth fractional digit of x differs from the nth frac-
tional digit of f (n).
Note that we avoid the problem of certain numbers such as 2.3999 . . . and
2.4000 . . . being equal by never selecting a 9 or a 0. Similarly, we can use
this diagonalization method to show that N
∼ {0, 1}N .
Theorem: N
∼ [0, 1]
Proof of Poincaré (see [17]) We show there is no bijection f : N → [0, 1],
in particular (∀f : (f : N → [0, 1]) : f is not surjective). We do this
by constructing for every function f : N → [0, 1] a y ∈ [0, 1] such that
(∀n : n ∈ N : f (n)
= y). We construct this y by means of a chain of
segments (see paragraph 3.5.2).
Let f : N → [0, 1]. Let Sn be an infinite chain of segments such that
1) (∀i : i ∈ N : f (i) ∈
/ Si )
2) (∀i : i ∈ N : Si+1 ⊆ Si )
3) (∀i : i ∈ N : | Si | = 3−i−1 ),
with | Si | being the length of segment Si .
Theorem of Cantor-Bernstein: V ≤1 W ∧ W ≤1 V → V ∼ W
Proof: Assume V ≤1 W and W ≤1 V . Then there are injections f : V → W
and g : W → V . We know that Dom(g) = W , so to prove g is surjective
we have to prove Ran(g) ∼ W . Since Ran(g) ⊆ V and g ◦ f is an injec-
tion from V to Ran(g), we have V ≤1 Ran(g). And since for all W and V ,
W ⊆ V ∧ V ≤1 W → V ∼ W (see the lemma below), we have Ran(g) ∼ V .
3.6. INFINITE SETS 57
Lemma: W ⊆ V ∧ V ≤1 W → V ∼ W
Proof: Suppose W ⊆ V and V ≤1 W . There is an injection h : V → W . Let
A0 := V − W , and (∀n : n ∈ N : An+1 := h(An )). We now give the desired
bijection k : V → W .
• k(a) := a if a ∈
/ n An
• k(a) := h(a) if a ∈ n An
• k is injective:
a
= b, then
Suppose k(a)
= k(b) by using a case analysis
a ∈
/ n An ∧b ∈ / n An, a ∈
/ n An ∧ b ∈ n An , a ∈ n An ∧ b ∈ /
A
n n , a ∈ A
n n ∧b ∈ A
n n . For all cases, it follows that k(a)
= k(b)
by the definition of k and the injectivity of h.
Example: We prove that (a, b) ∼ [0, 1] for all a, b ∈ R by using the theorem
of Cantor-Bernstein. We first prove that (0, 1) ∼ [0, 1] and consequently
that (0, 1) ∼ (a, b). Then, by the transitivity of ∼ we can conclude that
(a, b) ∼ [0, 1].
Proof of (0, 1) ∼ [0, 1]: The identity function id(0,1) : (0, 1) → [0, 1]
is an injection from (0, 1) to [0, 1], so (0, 1) ≤1 [0, 1]. The function
f (x) = 13 (x + 1) is an injection from [0, 1] to (0, 1), so [0, 1] ≤1 (0, 1).
By the theorem of Cantor-Bernstein we now know that (0, 1) ∼ [0, 1].
Theorem: V is infinite → N ≤1 V
Proof: V is infinite and thus not empty. We take one element x0 ∈ V . Next,
we take an element x1 ∈ V − {x0 }. We can repeat this infinitely (i.e. for all
n we can select an x ∈ V − {x0 , . . . , xn }), if we assume that it is possible
to always select an element from any non-empty set (see the axiom of choice
below). In this way we get a countable subset of V , namely {x0 , x1 , x2 , . . .}.
The only assumption we have made here is the so-called axiom of choice.
Theorem: R2 ∼ R ∼ (0, 1)
Proof: We can say that R ∼ (0, 1) if there is a bijection between (0, 1)
and R. Indeed, there exists a bijection f : (0, 1) → R, defined as f (x) =
tan( π2 (2x−1)). Thus: R ∼ (0, 1). If we consider an element of R2 , that is two
real numbers between 0 and 1, then we can map these numbers to an element
r ∈ R by interchangeably taking the next digit of each of the two numbers.
For example, we map (0.76584 . . . , 0.13275, . . .) uniquely to (0, 71635 . . .).
Thus: R2 ∼ R. Since ∼ is transitive, we know that R2 ∼ R ∼ (0, 1).
Corollary: P(N) ∼ R
Proof: This directly follows from P(N) ∼ (0, 1) and (0, 1) ∼ R, and the
transitivity of ∼.
60 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
After showing that the real numbers cannot be put into one-to-one corre-
spondence with the natural numbers (see section 3.5), Cantor hypothesized
in 1877 that each infinite subset of R is either denumerable or equivalent
to the continuum. This hypothesis was first published in 1878 in [13] and
became later known as:
This hypothesis (as given in [17, page 128]) is also known in many other
forms, of which we will mention and explain the most important. We can
immediately see that the following version of CH is equivalent to the given
definition: ‘any set of real numbers is either finite, countably infinite or has
the same cardinality as the entire set of reals’. This means that ‘the num-
ber of real numbers is the next level of infinity above the number of natural
numbers’ (see also [30, page 197]).
7
Actually in this formulation we have identified the cardinalities ℵ0 and ℵ1 with the
sets that have these cardinalities.
3.7. THE CONTINUUM HYPOTHESIS 61
Some of the theory that is needed in the remaining part of this section, for
the generalized continuum hypothesis, will be introduced in later chapters.
If you are not familiar with the notations that are used, you might want to
skip the remaining part of this section and get back to it later.
- Georg Cantor
We can see that cardinality abstracts from the order and nature of the
elements, and for finite sets the cardinal number can be identified with the
ordinary ‘number of elements’. Therefore we identify the cardinal number of
a finite set of n elements with the natural number n. We denote the smallest
infinite set (or transfinite) cardinal number by ℵ0 . As we have already seen
on page 52, this is the cardinal number of N or any denumerable infinite set.
Cantor defined the ‘next’ levels of infinity by ℵ1 , ℵ2 , . . ..
The next question was how to pass from the abstract notion of cardinal
numbers to real cardinal numbers, i.e. one wanted to regard cardinal numbers
as objects of the mathematical system. It turned out to be quite a problem
to define the cardinal V of a set V as an object of set theory. In naive set
theory, as well as in Quine’s ‘New Foundations’ (see section 7.3), the defini-
tion of the cardinal V of V poses no problem: V can be defined as the set
of all sets equivalent to V . But this definition (first given by Frege, see page
64 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.6) of cardinal numbers as given in section 3.6 can lead to a paradox that
was first found by Cantor.
Cantor’s paradox: The set of all sets is its own power set. Therefore, the
cardinality of the set of all sets must be bigger than itself.
In axiomatic set theory however (e.g. in ZF, see section 5.3), without the
unrestricted comprehension axiom, there is no set which contains all sets
equivalent to V . With this paradox the need arose to find a new definition of
cardinals in a context without the unrestricted comprehension axiom, such
that traditional paradoxes could no longer be derived.
For each set V we can prove (see [17, section 2.10]) that there exists
exactly one cardinal number α satisfying V ∼ α (proof uses AC). We call
this unique α the cardinality or cardinal number of the set V , and is also
denoted by V .
In other words, with the axiom of choice we can develop the theory of
ordinals in the von Neumann way and define V to be the least ordinal α equiv-
alent to V . The existence of such an α is guaranteed by the well-ordering
theorem. If we have the axiom of foundation among our axioms, even if the
axiom of choice is absent we can define V as the set of all sets W of least
rank among those equivalent with V (see [1]). In the absence of the axioms
of choice and foundation the operation V is undefinable (see [1]).
Here we consider sets with a total ordering (see page 25). Recall that in
addition for a well-ordered set, each non-empty subset also has a first mem-
ber in the given ordering (see also page 3.2). In the case of ordered sets, the
concept of equivalence is now replaced by the sharper concept of similarity.
We consider two ordered sets V and W similar , notation V W , if there is
a bijection between V and W that retains all order relations. Note that we
have already seen this relation with the concept of isomorphism (‘is isomor-
phic to’, see page 31), and note that is an equivalence relation. Instead
of saying two sets are similar, we also can say they are of the same order type.
The smallest infinite ordinal number is called ω. This is the ordinal num-
ber of the sequence {0, 1, 2, 3, . . .}, which can be seen as N or as the sequence
of finite cardinal numbers in their ‘natural’ order. We introduce some other
transfinite ordinals by example (from [10, page 66]).
Example:
If we call the set ∅ as ‘0’, the next set as ‘1’, etc., then consider the union
of all the sets {0, 1, 2, . . . }. This is another ordinal called ω and is the
first non-finite ordinal. It has a successor: ω ∪ ω, called ω + 1. More
ordinals can be obtained by continuing this succession, and taking the
union of all these ordinals yields an ordinal we call ω ∗2, etc. The natural
numbers in reverse order are denoted ∗ω.
V1 = {2, 3, 4, . . . , 1} ; V2 = {3, 4, 5, . . . , 1, 2}
V3 = {1, 3, 5, . . . , 2, 4, 6, . . .} ; V4 = {. . . , 3, 2, 1}
N = ω ; V1 = ω + 1 ; V2 = ω + 2 ; V3 = ω + ω = ω ∗ 2
V4 = ∗ω ; V5 = ω +∗ω ; V6 = ω ∗ 10
The Burali-Forti Paradox: The set of all ordinal numbers, taken in their
natural order, form a well-ordered series, and therefore also has an ordinal
number Ω. But the ordinal number of any subset of the set of all ordinals
exceeds every number of that subset, and therefore Ω exceeds any ordinal
number whatsoever.
2) (∀β :: β ∈ α ↔ β ⊂ α)
This means that ordinals give us a way of ‘counting’ any set, even if it is
not finite. The particular significance of the well-ordering theorem lies in the
possibility that we can apply the principal of mathematical induction (which
is well known for denumerable sets, see section 3.4.3) to any arbitrary well-
ordered set. Ordinal numbers form the basis of transfinite induction which
is a generalization of the principle of induction.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 69
• Two finite and ordered sets have the same order type if and only if they
have the same cardinal number
• Cantor’s theorem : the cardinality of any set is lower than the cardi-
nality of the set of all its subsets (i.e. there is no highest aleph)
• If two sets have the same ordinal number, they have the same cardinal
number, but not necessarily vice versa
Since then, Peano strived for rigor, for an abstract mathematics. He came
to the conclusion that mathematics must be constructed, independently of
intuition or common sense, in a way that absolutely guarantees the validity
71
72 CHAPTER 4. PEANO AND FREGE
of its theorems.
1) 0 ∈ N
(zero is a natural number)
2) a ∈ N → a+ ∈ N
(the immediate successor of any number is a number)
3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) → N ⊂ S
(if a set S contains zero and if it contains any number x it also contains
the immediate successor x+ of that number, then S includes the whole
of N)
4) a, b ∈ N ∧ a+ = b+ → a = b
(no two different numbers have the same immediate successor)
5) a ∈ N → a+
= 0
(zero is not the immediate successor of a number)
Axiom three has the function to formalize the principle known as mathe-
matical induction. We can show that in ZF (see section 5.3) we can derive
the five axioms of Peano. For more information on the Peano axioms, I refer
to [31, chapter 5], [49, page 146-147] and [64, appendix A].
Peano then showed how rationals and reals can be formally obtained from
naturals, and further considered elementary analysis and geometry. In later
years, Peano turned away from the foundations of mathematics and devoted
most of his time on his new international auxiliary language Interlingua. He
invented this language (see [49, page 148-150]) in an attempt to reduce the
grammatical structure of languages and create a universal language. His
mathematical work were to have a profound influence on the thought of
mathematics, but his language Interlingua received little response.
74 CHAPTER 4. PEANO AND FREGE
Begriffsschrift
do not fall under the patterns of traditional logic (also called syllogisms). Ac-
tually this is another kind of inference that contains a conditional expression
of the form:
if B then A
Therefore, A.
Frege adopted this new rule in the system of logic of his Begriffsschrift.
With arbitrary expressions for A and B, the rule became later known as
modus ponens. A logic that evaluates these sorts of expressions is called a
propositional logic.
1 x → (y → x)
3 (x → (y → z)) → (y → (x → z))
4 (x → y) → (¬y → ¬x)
5 ¬¬x → x
6 x → ¬¬x
76 CHAPTER 4. PEANO AND FREGE
7 (x = y) → (F (x) → F (y))
8 x=x
Throughout his work Frege developed (as the first) the main thesis of logi-
cism, that mathematics is reducible to logic. But thereto, he had to do more
than developing a new logical symbolism. His next book, ‘Die Grundlagen
der Arithmetik’ (1884), was devoted to the ‘foundations of arithmetic’. In
this book, Frege treated the foundations of arithmetic, based on the concept
of (cardinal) numbers. He put forward the logicist philosophy that arithmetic
could be founded upon logic alone, and he discussed work of others in detail
(see [49, 184-185]). In [31, page 183] we learn more about Frege’s philosophy.
In the introduction of his book Frege announced his three guiding principles:
1) Always to separate sharply the psychological from the logical, the sub-
jective from the objective
2) Never to ask for the meaning of a word in isolation, but only in the
context of a proposition
4.2. FREGE’S WORK 77
In his book he presented his own theory of numbers, and wanted to show
that all the truths in arithmetic are derivable from logical laws and defini-
tions alone. He did this by sketching the proof, but not giving the official
Begriffsschrift proofs of the truths of arithmetic. Before Frege could do that
he needed a new version of Begriffsschrift, to accompany the new require-
ments that his formalization of the concept of numbers had, but also to fill
in pieces that were simply missing.
In his next three papers ‘Function and Concept’, ‘On Sense and Meaning’,
and ‘On Concept and Object (1892)’, he introduced all modifications that he
was to make to his language, Begriffsschrift, and his logical system. During
that period he also completed his definitions of the natural numbers and some
of the proofs of simple truths of arithmetic from these definitions and logical
laws. His new logical calculus included a symbolic representation of the truth
value of any given proposition, which provided a shorter notation for many
Begriffsschrift propositions. The calculus also had several other new logical
and arithmetical symbols, one of the most important of them being a notation
for what Frege called the ‘course-of-values’ of a propositional function. The
course-of-values of a propositional function ϕ , denoted by Frege as ε̆ϕ(ε),
denoted the truth value for all possible values of the argument (here ε). We
denote it as cov and define equal course-of-values by cov(f ) = cov(g) ↔ (∀a ::
f (a) = g(a)). In 1893, Frege published the first volume of his ‘Grundgesetze
der Aritmetik’, the ‘Basic Laws of Arithmetic’. It set out the new version of
logic and began the proofs that were to make the project successful. In the
second part Frege wanted to define the natural numbers and some basic laws
governing them and, in the third part, he would define the real numbers and
lay the foundations for expressing analysis in terms of logic. In 1902, when
volume 2 was in press, he received a now famous letter from the English
mathematician and logician Russell (see chapter 5), who pointed out, with
great modesty, a contradiction could be derived in Frege’s system (see section
5.1). This contradiction would later be named after Russell and become
known as ‘Russell’s paradox’.
78 CHAPTER 4. PEANO AND FREGE
After many letters between the two (see for example [93, pages 124-128]),
Frege modified one of his axioms and explained in an appendix to the book
that this was done to restore the consistency of the system. However with
this modified axiom, many of the theorems of volume 1 do not go through
and Frege must have known this. He probably never realized that even with
the modified axiom the system is inconsistent since this was not shown until
after Frege’s death in 1925, by Leshniewski (see [85]).
In this text I have made extensive use of the excellent books [98] and [97]
about Frege that contain many more references about Frege and his work,
and chapter 4.5 from [31] and chapter 6, section 4 from [49].
Chapter 5
Russell
Russell’s private life, affairs, imprisonment, his social and political cam-
paigns and advocacy of both pacifism and nuclear disarmament are certainly
interesting, but we will not discuss these subjects here (see for more informa-
tion and references on Russell’s life and work [62], [80] and [31, chapter 6, 7,
11 and sections 8.2, 8.3, 8.4, 8.8.3, 8.9.2, 10.1, 10.2.1]). I quote the following
assessment from [73]: “Bertrand Russell had one of the most widely varied
and persistently influential intellects of the 20th century. During most of his
active life, a span of three generations, Russell had at any time more than
40 books in print ranging over philosophy, mathematics, science, ethics, so-
ciology, education, history, religion, politics and polemic. The extent of his
influence resulted partly from his amazing efficiency in applying his intellect
(he normally wrote at the rate of 3,000 largely unaltered words a day) and
partly from the deep humanitarian feeling that was the mainspring of his ac-
79
80 CHAPTER 5. RUSSELL
tions. This feeling expressed itself consistently at the frontier of social change
through what he himself would have called a liberal anarchistic, left-wing,
and skeptical atheist temperament.”
Russell discovered the paradox which bears his name in 1901, while
working on his ‘Principles of Mathematics’ (1903). The paradox and the
closely related vicious circle principle are discussed in section 5.1. Russell’s
own response to the paradox came with the introduction of types (see chap-
ter 7). Using the vicious circle principle also adopted by Henri Poincaré,
together with Russell’s so-called ‘no-class’ theory of classes, Russell was then
able to explain why the unrestricted comprehension axiom (see section 2.1)
fails: propositional functions, such as ‘x is a set’, should not be applied to
themselves since self-application would involve a vicious circle. On this view,
it follows that it is possible to refer to a collection of objects for which a
given condition (or predicate) holds only if they are all at the same level or
‘type’.
Of equal significance during this same period was Russell’s defense of logi-
cism, the theory that mathematics was in some important sense reducible to
logic. First defended in his Principles, and later in more detail in ‘Principia
Mathematica’, Russell’s logicism consisted of two main theses. The first
is that all mathematical truths can be translated into logical truths or, in
other words, that the vocabulary of mathematics constitutes a proper subset
of that of logic. The second is that all mathematical proofs can be recast as
logical proofs or, in other words, that the theorems of mathematics consti-
tute a proper subset of those of logic.
Like Gottlob Frege, Russell’s basic idea for defending logicism was that
numbers may be identified with sets of sets and that number-theoretic state-
ments may be explained in terms of quantifiers and identity. It followed
that number-theoretic operations could be explained in terms of set-theoretic
operations such as intersection, union, and the like. In ‘Principia Mathema-
tica’ Whitehead and Russell were able to provide detailed derivations of many
major theorems in set theory, finite and transfinite arithmetic, and elemen-
tary measure theory. A fourth volume on geometry was planned but never
completed.
- Russell, in [78]
Paradoxes have been known for a long time, but in particular with the
introduction of more formal systems at the end of the 19th century paradoxes
became more influential on the foundations of mathematics. Before we de-
scribe the most famous paradox of Russell, we first define the notion of a
paradox.
In [86], three ‘paradox threats’ are identified: when systems are complex,
formal or designed for computers, there often is not enough intuition to notice
inconsistencies. With the previously described formalizations, the systems
of Cantor (see chapter 2), Peano (see section 4.1), Frege (see section 4.2),
and not to mention Russell himself were at risk. And indeed, in 1902 Russell
discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. The paradox
turned out to be at the basics of mathematics, since it could be formulated in
all the systems mentioned above. We first formulate the paradox in Cantor’s
set theory:
Russell in 1901 studied Cantor’s work [31, section 6.6.1] and after noting
that some sets belonged to themselves while the rest did not do so, Russell
showed that the set of all sets which do not belong to themselves belongs to
itself if and only if it does not do so - and, by repetition of the argument,
vice versa also. Russell also expressed this paradox in terms of predicates,
and as such first presented his discovery in a letter to Frege (see [93, page
124] and see also the quote on page 78).
Since Peano’s system was based on the set theory of Cantor, also Peano’s
work contained the paradox. In Frege’s work (Grundgesetze der Aritmetik)
self-application was not possible, so R ∈ R was not allowed, but the para-
dox could still be expressed by using Frege’s notion (see page 77) of the
course-of-values of a function. If we define equal course-of-values cov by
cov(f ) = cov(g) ↔ (∀a :: f (a) = g(a)), we can derive the paradox in Frege’s
work as follows (see also [86, page 7] for a slightly different proof):
¬f (K)
≡ {def. f}
≡ {elim.¬¬}
≡ {instantiate ϕ with f }
cov(f ) = K → f (K)
≡ {def. K, elim. →}
f (K)
The paradox had a big influence, since it could be formulated in all sys-
tems, and all statements in classical logic were entailed by a contradiction.
84 CHAPTER 5. RUSSELL
After discovering his famous paradox, Russell traced the fallacy back to
what he called the ‘vicious circle principle’. The ‘vicious circle’ that his prin-
ciple is named after, arises from the assumption that a set of objects may
contain members which can only be defined by means of the set as a whole.
Therefore, Russell said that statements are not legitimate and meaningless,
if they contain a set of objects such that it will contain members which pre-
suppose this (total or whole) set of objects. That means a statement is only
legitimate if all propositions it contains refer to already defined sets.
In a sense those impredicative definitions are thus circular, and were con-
sidered the cause of antinomies. For more information about impredicativity,
see [57, section 15.3].
Vicious circle fallacies are arguments that are condemned by the vicious
circle principle. Such arguments may not necessarily lead to contradictions
(since fallacious arguments can lead to true conclusions).
set, the nature of the apparent element remains the same. The ‘nature’ of the elements
can be seen as all the members of that element (or in case the element is an individual,
the nature of the apparent element can be seen as that individual). This leads us to the
following axiom:
(∀X :: (∀x : x ∈ X : x = a → (∀x : x ∈ X ∧ x
= x : x = b(x ) → a ∈ X))). Clearly
this does not avoid the paradox of Russell. We consider a set X:=R ≡ {x | x ∈ / x} and
an element x ∈ R, i.e. we have x ∈ / x. Despite the fact that the set X is ‘too large’, the
axiom does not prohibit the existence of the set X. The axiom tells us x = a → (∀x :
x ∈ R ∧ x
= x : x = b(x ) → a ∈ R). In other words, we can change each element in R
except x and the nature of x should not depend on it. The only thing we know about x is
that x
∈ x and x ∈ R. So to obtain a contradiction we have to show that x ∈ x ∨ x ∈ / R.
Now we can change all x into any value b(x ), but still we will have x ∈ / x and x ∈ R. So
unfortunately this most ‘direct’ attempt to solve the paradox fails.
2
Russell formulated it originally as ‘Whatever involves all of a collection must not be
one of the collection’. Or, as formulated in [49, page 113]: ‘If, provided a certain collection
had a total, it would have members only definable in terms of that total, then the said
collection has no total’. Another formulation of [87] says ‘No entity can be defined in
terms of a totality of which it is itself a possible member’.
86 CHAPTER 5. RUSSELL
4 The liar’s paradox: We quote from [49, page 127]: “If a man says ‘I
am lying’, his utterance is self-contradictory, and it cannot be either
true or false. The oldest form of this particular paradox, in the words
of Principia Mathematica, is that of Epimenides the Cretan, ‘who said
that all Cretans were liars, and all other statements made by Cretans
were certainly lies’.”.
7 Berry’s paradox: “The least integer not nameable in fewer than nine-
teen syllables” is itself a name that contains only eighteen syllables.
5.1. RUSSELL’S PARADOX 87
The first three paradoxes are logical paradoxes that can be formulated
within Cantor’s set theory. The remaining five are mainly paradoxes of nam-
ing, they are of a semantic kind. All these paradoxes have stimulated funda-
mental research, and especially Russell’s paradox that revealed the vicious
circle principle and first showed the need for a theory of types or other re-
striction of the power of the comprehension axiom.
88 CHAPTER 5. RUSSELL
that every nonempty set of real numbers having an upperbound has a least
upperbound”.
Other attempts towards a solution for the paradoxes of set theory focus on
the foundations of logic. Luitzen Brouwer and the intuitionists took this
approach and tried to prevent the paradoxes by denying the principle of the
excluded middle (which states that any mathematical statement is either
true or false). Brouwer first attacked the logical foundations of mathematics
in his doctoral thesis in 1907; This formed the beginning of the Intuitionist
School. The intuitionists had the basic idea that one cannot assert the exis-
tence of a mathematical object unless one can also indicate how to go about
constructing it.
Logicists contend that all of mathematics can be deduced from pure logic,
without the use of any specifically mathematical concepts, such as number or
set. The first ideas date back to Leibniz (1616) and the actual reduction of
mathematics to logic was started by Dedekind (1818) and Frege (1884-1903)
and later by Peano, and Whitehead and Russell (in Principia Mathematica
1910-1913).
Zermelo noted that the sets involved in a derivation of the paradoxes are
very large3 (for Cantor’s paradox it is the set of all sets (see section 3.8.1),
for Russell’s paradox it is the set of all sets which are not members of them-
selves (see section 3.8.2), and for the Burali-Forti paradox (see section 3.8.2)
it is the set of all well-orderings). Therefore he wanted to restrict the size of
sets, and he changed the (naive) comprehension principle into his separation
axiom, such that the paradox could no longer be derived:
There are also certain limitations on the property ϕ (i.e. it should be de-
finite) that we will mention later in section 8.5. We show that the standard
derivation of Russell’s paradox cannot be applied when the naive compre-
hension axiom is replaced by the separation axiom.
Let R = {x | x ∈ Z ∧ x ∈
/ x}
R∈R↔R∈Z ∧R∈
/R
→R∈
/ R, contradiction.
R∈
/R↔R∈
/ Z ∨R∈R
3
The term proper class is sometimes used to refer to these ‘excessively large’ sets; all
other sets are then referred to as improper classes. This means all sets are classes but not
every class is a set. A class that is not a set is called a proper class.
4
See section 8.5 for the definition of the concept of definiteness.
94 CHAPTER 5. RUSSELL
←R∈
/Z
However, this fact alone does not guarantee that there does not exist a
paradox, as claimed in some articles, but merely that the separation axiom
does not permit the construction of paradoxical sets with elements defined
in terms of the sets themselves. But until consistency is proved, there might
be other less obvious ways to construct a paradox.
We now give all of the ZF axioms that constitute set theory. The first
seven axioms are those that were originally formulated by Zermelo. Axiom
8 and 9 were later added by Fraenkel and von Neumann respectively. The
axioms 1 through 8 are the original set of the Zermelo-Fraenkel axioms.
3. Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y.
For every set z there exists a set y whose elements are exactly those of
z having the property ϕ.
4. Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
7. Axiom of infinity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z))
There exists a successor set.
Theorem: (from [49, chapter 11]) The domain B itself (see page 92) is not
a set.
Proof: Suppose V is any given set. Then5 , V has a subset W that consists of
those elements of V that are not members of themselves. But then W is not
an element of itself (because in that case we would have W ∈ W , while W
5
Since the property x ∈
/ x is definite. See section 8.5 for the definition of the concept
of definiteness.
96 CHAPTER 5. RUSSELL
The axioms are not minimal. For example, as we have already seen in
section 2.26 , the axiom of the empty set can be deduced from the separation
axiom. We also have empty set axiom + substitution axiom ! separation
axiom. We have also seen in section 2.2 how we can define basic operations
with the extensionality and separation axioms. The pairing, sum and pow-
erset axioms, together with the extensionality axiom, ensure uniqueness of
the pairs, sums and powersets of sets. With these axioms alone we can al-
ready create an infinite number of axioms. However, each set constructed
6
The existence of the empty set in section 2.2 was actually derived from the compre-
hension principle but the result can similarly be obtained from the separation axiom.
5.3. ZERMELO FRAENKEL 97
Definition of epsilon-minimal:
An element b ∈ a is epsilon-minimal in a := b ∩ a = ∅
The origin of the axiom of choice was Cantor’s recognition of the impor-
tance of being able to well-order arbitrary sets; i.e., to define an ordering
relation for a given set such that each nonempty subset has a least element.
The virtue of a well-ordering for a set is that it offers a means of proving
that a property holds for each of its elements by a process (transfinite in-
duction) similar to mathematical induction. Zermelo (1904) gave the first
proof that any set can be well-ordered. His proof employed a set-theoretic
principle that he called the axiom of choice, which, shortly thereafter, was
shown to be equivalent to the so-called well-ordering theorem. One form of
this principle is expressed as the axiom of choice. A choice function for a set
A ‘chooses’ an element from each non-empty subset in A. If x is a nonempty
set the elements of which are nonempty sets, then there exists a function f
with domain y such that for member a of y, f (a) ∈ a. For a more detailed
discussion of the axiom of choice we refer to [17, section 2.9].
Intuitively, the axiom asserts the possibility of making a simultaneous choice
of an element in every nonempty member of any set; this guarantee accounts
for its name. The assumption is significant only when the set has infinitely
many members. Zermelo was the first to state explicitly the axiom, although
98 CHAPTER 5. RUSSELL
it had been used but essentially unnoticed earlier. It soon became the subject
of vigorous controversy because of its unconstructive nature. There are a few
mathematicians who feel that the use of the axiom of choice is improper, but
to the vast majority it, or an equivalent assertion, has become an indispens-
able and commonplace tool. For this discussion of the axiom of choice we
have used [63], [77] and [11].
A discussion of the Generalized Continuum Hypothesis can be found in
section 3.7.
Chapter 6
Hilbert
99
100 CHAPTER 6. HILBERT
were intended to lead to a proof theory. Despite that in 1931 Kurt Gödel
showed this goal to be unattainable (see chapter 8), the work Hilbert had
done on the foundations of mathematics nevertheless remained influential to
the development of logic. Hilbert’s work on integral equations in about 1909,
(see [45]) led to research in functional analysis and established the basis for
his work on infinite-dimensional space, later called Hilbert space (see [22,
page 232]). When Hilbert was made an honorary citizen of Göttingen he
gave an address which ended with six famous words, showing his enthusiasm
for mathematics and optimism for solving mathematical problems: “There
are absolutely no unsolvable problems. Instead of the foolish ignorabimus
[Latin for ‘the ignorant’], our answer is on the contrary: Wir müssen wissen,
Wir werden wissen” [We must know, We shall know].
6.1. HILBERT’S PROOF THEORY 101
Definition of an axiom:
A proposition that is regarded as true without proof
An axiom that does not contain any variables is also called an axiom
statement, an axiom with free variables is called an axiom scheme and each
free variable is to be quantified over all well-formed formulas.
Of the systems that Hilbert’s proof theory applies to, we here consider
those susceptible to Gödel’s incompleteness theorem (that will be presented
in chapter 8).
The recursive definition over the given alphabet gives us the set of ex-
pressions. The variables enable us to form predicates. The set of axioms and
derivation rules let us prove or refute sentences. Ideally, we want all sen-
tences that are provable coincide with the sentences we intuitively consider
true (P = T ) and the refutable sentences coincide with those we consider
false. We call a system with this property correct. We now give an example
of a definition of a simple axiomatic system.
• ϕ is a well-formed formula if it
R0 (c, d) ∀x(ϕ)
true f alse
¬f alse ¬true
true f alse
true ∧ ϕ f alse ∧ ϕ
ϕ f alse
true ∨ ϕ ϕ ∨ true
true ϕ
106 CHAPTER 6. HILBERT
3. The provable sentences P are those that are true from the derivation
rules. For example, ¬ false ∧ R0 (false, true) → true ∧ R0 (false, true)
→ true ∧ true → true.
4. The refutable sentences R are those that are false from the derivation
rules. For example, ∀y (false ∨ y) ∧ true → false ∧ true → false.
6. For each such predicate we can replace the free variable by a formula
that is represented3 by a natural number, and obtain a proposition.
2
Sometimes it is also said that an axiomatic system A1 gives rise to a language LA
3
An example of such a bijective function between a predicate and a set of natural
numbers will be given in section 8.2.
6.1. HILBERT’S PROOF THEORY 107
Example:
A1
! ∀x)x¬ (since the formula is not well-formed, i.e. does not follow
to be true from the syntax definition)
A1
! ∀y (false ∨ y) ∧ true (since it does not follow from the derivation
rules, i.e. is a refutable sentence)
So now we can say that Hilbert was looking for an axiomatic system for
which logic can be a model. Hilbert proposed such an axiomatic system to
have the properties of consistency, completeness and decidability. We will
now introduce these concepts, along with some other properties of axiomatic
systems. Since the properties of an axiomatic system A give rise to corre-
sponding properties in the language LA , we here distinguish in each definition
between the property of a language and of an axiomatic system.
Definition of decidability:
A language L is decidable := (∀ϕ :: (ϕ ∈ P ∨ ϕ ∈ R)).
An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that de-
cides in a finite number of steps whether (or not) A ! ϕ) (see also [49, page
270])
Definition of consistency:
A language L is consistent := ¬(∃s : s ∈ S : s ∈ P ∧ s ∈ R), i.e. P ∩ R = ∅
or no sentence is both provable and refutable in L.
An axiomatic system A is consistent := ¬(∃ϕ :: A ! ϕ ∧ A ! ¬ ϕ) (i.e. it is
not possible for any formula ϕ, to derive both ϕ and ¬ϕ) (see also [49, page
240])
Definition of completeness:
A language L is complete for a model M := (∀ϕ :: M |= ϕ → ϕ ∈ P).
An axiomatic system A is complete for model M :=
(∀ϕ :: M |= ϕ → A ! ϕ) (i.e. all true statements in the model are deriva-
ble/provable)
Definition of soundness:
A language L is sound for a model M := (∀ϕ :: ϕ ∈ P → M |= ϕ).
An axiomatic system A is a sound axiomatization for a model M :=
(∀ϕ :: A ! ϕ → M |= ϕ) (i.e. if a statement ϕ is derivable/provable, it is
true in the model)
Definition of correctness:
A language L is correct for a model M := P ⊆ T ∧ R ∩ T = ∅ (i.e. every
provable sentence is true and every refutable sentence is false (not true)).
An axiomatic system A is correct for a model M := A is sound for M and
A is complete for M
(this was done by Turing, see section 9.1). Also, Hilbert asked that an
algorithm be devised . Thus he apparently assumed such an algorithm
exists, but now we know that this problem is algorithmically unsolv-
able. In 1970, the young Russian Yuri Matijasevic, building on the
work of Martin Davis, Hilary Potnam and Julia Robinson, showed that
no algorithm exists for testing whether a polynomial has integral roots.
At the end of his article, Hilbert says that he does not believe mathema-
tics will, like other sciences, split into separate branches whose connection
becomes ever more loose, but that the organic unity of mathematics is in-
herent in the nature of this science, for mathematics is the foundation of all
exact knowledge of natural phenomena. For a more detailed assessment of
Hilbert’s view, see [49, section 12.4] and [31, section 4.7].
Chapter 7
Types
In section 4.1 we saw that with the postulates he presented, Peano stated
and organized the fundamental laws of number theory, the core of mathema-
tics. If statements satisfying these conditions could be derived in this logic,
it would show that (at least part of) mathematics was founded in pure logic.
As we have seen in section 4.2, Frege was adherent to the goal of logicism that
all of mathematics could be derived from logic alone. But unfortunately the
language that he created was inconsistent, as we have learned from Russell’s
paradox in section 5.1. In his 1908 paper, ‘Mathematical Logic as Based on
the Theory of Types’, Russell laid out a theory to eliminate the paradoxes.
With Principia Mathematica, Bertrand Russell and his teacher, the mathe-
matician Alfred Whitehead, presented this theory to prevent the paradoxes
while at the same time allowing many of the operations Frege considered de-
sirable. The theory of types basically says that all sets and other entities have
113
114 CHAPTER 7. TYPES
a logical ‘type’, these types can be ordered and sets are always constructed
from specified members with lower types. We will look at the theory of types
in more detail in section 7.2.
Principia Mathematica consisted of three volumes (sometimes also called ‘the
Principia’) and was named after the ‘Philosophiae naturalis principia mathe-
matica’ of the English physician Isaac Newton. But unlike Newton’s book it
dealt not with the application of mathematical techniques to physics, but to
logic and mathematics itself. With their mathematical treatment of the prin-
ciples of the mathematicians, Russell and Whitehead intended to summarize
the recent work in logic as well as to give a revolutionary and systematical
development of mathematical logic and derive basic mathematical principles
from the principles of logic alone.
Their collaboration began in 1903 when Whitehead and Russell were both
in the initial stages of preparing second volumes to earlier books on related
topics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s
1903 ‘The Principles of Mathematics’. Their work overlapped considerably
and they began collaborating on what would become ‘Principia Mathema-
tica’. The approach of Russell and Whitehead was essentially that of Frege,
to define mathematical entities (like numbers) in pure logic and then derive
their fundamental properties. Indeed, their definition of natural numbers was
basically the same as the one of Frege, but unlike him, they opted to avoid
the philosophical aspects and justifications. Although ‘Principia’ was largely
successful there still was critique on the axioms of infinity and the axiom of
reducibility, they were considered to be too ad hoc solutions to be justified
philosophically. In 1919 Russell published about the philosophy behind his
work in an ‘Introduction to Mathematical Philosophies’ which was accessible
to a broad audience and therefore has been the main source through which
Russell’s logicist view of mathematics has become known.
ble, or whether there are concepts expressible in natural languages but not
in this logical notation. This is somewhat odd, given the well-known list of
problems posed by Hilbert in 1900 that came to animate 20th-century logic,
especially German logic. The Principia is a work of confidence and mastery
and not of open problems and possible difficulties and shortcomings; it is a
work closer to the naive progressive elements of the Jahrhundertwende than
to the agonizing fin de siecle.”. We would like to add that with the very for-
mal and accurate build-up of mathematics, Russell and Whitehead not only
managed to avoid the paradoxes but also created one of the most impressive
and complicated works of all times and that is, next to Aristotle’s Organon,
considered to be the most influential book on logic that was ever written.
[..] in the simple theory of types it is well known that the indi-
viduals may be dispensed with if classes and relations of all types
are retained; or one may abandon also classes and relations of the
lowest type, retaining only those of higher type. In fact any finite
number of levels at the bottom of the hierarchy of types may be
deleted. But this is no reduction in the variety of entities, because
the truncated theory of types, by appropriate deletions of entities
116 CHAPTER 7. TYPES
The nearly 2,000 pages Principia Mathematica starts with a short preface
that explains what it wants to demonstrate, namely that pure mathematics
can be based on logic alone and requires no other primitive notions. Russell
classifies statements that involve logical constants only (such as the laws of
reciprocity, see page 18 of Principia Mathematica) as pure mathematics, and
other mathematical assertions that also refer to non-logical contents (such as
the statement that (perceptual) space is three-dimensional) as part of applied
mathematics. The belief was then expressed that pure mathematics was suf-
ficient to include all traditional mathematics. Then, after an introduction,
the first volume introduces a symbolic logic that is based on a small set
of axioms, and then lays out the propositional and predicate calculi. Built
upon these, Whitehead and Russell define types, sets, relations and their
properties, and basic operations on sets. The second volume continues with
a purely logical theory of cardinal and ordinal arithmetic. This allowed them
to introduce basic arithmetic, including addition, multiplication and expo-
nentiation of both finite cardinals and of relations.
The volume ends with a general theory of simply ordered sets (series) which
is followed by a logical base of fundamental mathematical analysis, including
subjects as convergent sequences, continuity, limits and derivatives.
The third volume was meant to prepare the ground for the fourth and con-
cluding volume on geometry (which was never completed), and contained a
theory of numbers that was called ‘measurement’. It starts with a theory of
well-ordered sets, finite, infinite and continuous series, the negative integers,
ratios and the real numbers, and finally vectors, coordinates and basic geo-
metric notions such as angles.
More details about the organization of Principia Mathematica and a critical
assessment of its work can be found in [31, chapter 7, and specifically section
7.8].
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117
Russell and Whitehead opted for a more modern notation of Peano in-
stead of Frege’s Begriffsschrift. Unlike Frege, Russell and Whitehead treated
functions as first-class citizens. A good introduction to the logical calculus
and the specific notation that was used in Principia Mathematica can be
found in [49, section 3.2 and 3.3] and [31, sections 7.2, 7.3, 7.7 and 7.8].
What is a type?
A type is the range of significance of a propositional function, that is, the
collection of arguments for which the said function is significant and has val-
ues.
The type of a variable in a proposition is fixed by all the values the func-
tion is concerned with, i.e. by the totality over which the variable ranges.
This division of objects into types (the type of an object can be seen as a
property of that object) is necessary to conform to the vicious circle principle,
i.e. to make sure that ‘whatever contains an apparent variable must not be
a possible variable of that variable’. This can be established by making sure
that ‘an apparent variable’ is of a different and higher type than the possible
values of that type. This linear order of types prevents vicious circles, since
the variables contained in an object determine the type of that object.
7.3 Quine
Just as the introduction of the irrational numbers . . . is a conve-
nient myth [which] simplifies the laws of arithmetic . . . so physical
objects are postulated entities which round out and simplify our
account of the flux of existence . . . The conceptional scheme of
physical objects is [likewise] a convenient myth, simpler than the
literal truth and yet containing that literal truth as a scattered part
With the NFC axiom the paradox is obviously prevented, since the sen-
tence ϕ ≡ x ∈
/ x is not stratified.
122 CHAPTER 7. TYPES
We quote from [86, page 3]: “NF is weak for mathematical induction and
the axiom of choice is not compatible with NF. We cannot prove Peano’s
axiom[s] in it, unless we assume the existence of a class with m + 1 ele-
ments. Also, NF is said to lack motivation because its axiom of compre-
hension is justified only on technical grounds and one’s mental image of set
theory does not lead to such an axiom. To overcome some of the difficulties,
Quine adopted similar measures to NBG (Neumann-Bernay-Gödel, see sec-
tion 8.5) set theory[, and developed another non-iterative set theory called
ML (Mathematical Logic), first presented in [70]]. Like NBG, ML contains
a bifurcation of classes into elements and non-elements. Sets can enjoy the
property of being full objects whereas classes cannot. ML was obtained from
NF by replacing (NFC) by two axioms, one for class existence and one for
elementhood. The rule of class existence provides [. . . ] the existence of the
classes of all elements satisfying any condition ϕ, stratified or not. The rule
of elementhood is such as to provide the elementhood of just those classes
which exist for NF. Therefore, the two axioms of comprehension for ML [are]:
Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is stratified
with set variables only in which y does not occur free.
Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x)
is any formula in which y does not occur free.
ML was liked both for the manipulative convenience we regain in it and
the symmetrical universe it furnishes. It was however proved subject to the
Burali-Forti paradox”.
For more information, we refer to [70], [71], [72] and the website
http://diamond.boisestate.edu/∼holmes/holmes/nf.html.
Chapter 8
Gödel
123
124 CHAPTER 8. GÖDEL
already have seen in section 6.1 that this means that there is at least
one formula ϕ of arithmetic that is not true. So we can express A ≡
(∃y :: (∀x :: ¬prov (x is a proof of y))). A system is incomplete if
there is a true statement that is not provable. Thus we can represent
the conclusion of the conditional statement by G.
8 We can now formally prove A → G (see section 8.2 for the proof). This
means that if A is provable, we know (by modus ponens or the role of
detachment) that G is provable. But we already saw that (unless S
is inconsistent), G is not provable; thus if S is consistent, A is not
provable! That means if arithmetic is consistent its consistency cannot
be established by metamathematical reasoning within the formalism
of arithmetic (this is Gödel’s theorem 11, see [93, page 614]). Or, as
expressed in [31, page 510], ‘any set S of consistent formulae of P M
cannot include the formula F asserting its consistency’.
8.2. FORMALLY: GÖDEL’S INCOMPLETENESS THEOREMS 127
Note that in this proof we have not defined the set T by a model but
determined the truth of G by a metamathematical argument just as we have
seen in step 5 of section 8.1, that is nevertheless commonly accepted by all
mathematicians. Note also that the proposition G corresponds to the propo-
2
This assumption is for technical reasons that make the proof more simple; Gödel’s
original numbering did not have this restriction.
8.2. FORMALLY: GÖDEL’S INCOMPLETENESS THEOREMS 129
With the diagonal lemma we can also prove the first theorem as follows:
Since P∗ is expressible in L, by the diagonal lemma, there is a Gödel sentence
G for P. A Gödel sentence for P is a sentence which is (by the definition
of a Gödel sentence) true if and only if it is not provable in L. So for any
correct system L, a Gödel sentence for P is a sentence which is true but not
provable in L.
Now we have seen both theorems in a general form, we will consider particular
mathematical languages, starting with first order arithmetic, which we can
build on in section 8.3 to prove the incompleteness of systems based on
Peano’s arithmetic and other systems.
8.4 Consequences
I had a lot of conversations with him [Gödel] and a lot of dis-
agreements. Like most others, I was hard to convince about the
incompleteness theorem. There was at the time a tendency, which
I shared, to think that it was special to a certain type of formali-
zation of logic and that a radical reformalization might have the
effect that the Gödel argument did not apply. I persisted in that
longer than I should have, and he was always trying to convince
me otherwise.
There’s no sense in being precise when you don’t even know what
you’re talking about.
When Cantor introduced his set theory, he gave the informal definition
(see page 16) of a set being ‘any comprehension into a whole M of definite
and separate objects m of our intuition or thought’. After Hilbert proposed
his proof theory, set theory was given a more rigorous basis, and axiomatic
theories for Cantor’s sets were proposed. Cantor’s definition was replaced by
the principle of comprehension (see page 16), which was adopted by Frege
and Russell. Based on this principle a first formal theory of sets, called ‘ideal
calculus’ was developed (not treated in detail here, see for example [36]). The
antinomies of Burali-Forti and Russell however showed that this theory was
inconsistent, and one way to restore consistency was to incorporate in the
system a theory of types, as was done by Russell. At the same time, intu-
itionists tried to do mathematics without Cantor’s set theory at all. Others
tried to overcome the inconsistencies by making Cantor’s set theory more
rigidly axiomatic, and the most successful axiomatization of set theory was
presented by Zermelo in 1908.
The problem for him was to solve the problem of axiomatization in such
a way that it excludes all contradictions but still is sufficiently wide for all
that is valuable in this theory to be preserved. As we have seen in section
5.3, Zermelo postulated a domain of abstract objects (sets) and elements of
this domain, defined the primitive notions of ‘equality’ and ‘is element of’
relation, and introduced 7 axioms. The comprehension axiom was replaced
by the weaker separation axiom, that only allows new sets to be created
from existing sets and with definite predicates. Before we will describe why
the Hungarian mathematician von Neumann opposed this solution and came
with his own solution to the paradoxes, we will look at this separation axiom
136 CHAPTER 8. GÖDEL
Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y. For
every set z there exists a set y whose elements are exactly those of z having
the property ϕ.
In [83], the Norwegian mathematician Skolem pointed out that the defi-
nition of ‘definiteness’ was rather vague and he made precise the formulation
of ‘by the basic laws of logic’. Fraenkel used Skolem’s idea to formulate
the separation axiom in a new way (for details, see [49, page 290, 291]). In
1922 Fraenkel proposed the introduction of another axiom that allows the
existence of larger cardinal numbers than hitherto possible. The foundation
axiom of von Neumann makes occurrence of so-called extraordinary sets im-
possible. A set is extraordinary if there is a sequence of sets V1 , V2 , V3 , . . .
such that V2 ∈ V1 , V3 ∈ V2 , etc. Von Neumann’s subsequent interest in set
theory led to the second major axiomatization of set theory in the 1920s.
His formulation differed considerably from Zermelo and Fraenkel (see sec-
tion 5.3) because the notion of function, rather than that of set, was taken
as primitive. In a series of papers beginning in 1937, however, the Swiss
logician Paul Bernays, a collaborator with the formalist David Hilbert, mod-
ified the von Neumann approach in a way that put it in much closer contact
with Zermelo and Fraenkel. In 1940, the Czech-born Kurt Gödel, known for
his incompleteness proof (see chapter 8), further simplified the theory. This
version is known as the Neumann-Bernays-Gödel (NBG) axioms.
3
We quote: “since a definite property is one that is decidable by the basic relations of
the domain B [of sets, the abstract objects postulated by Zermelo], no such property as
that of being definable in a finite number of words can be used in the definition of a set,
and the semantic paradoxes are thus also excluded”.
8.5. NEUMANN-BERNAYS-GÖDEL AXIOMS 137
Example: ‘for all x, A(x)’ stands for ‘for all X, if X is a set, then A(X)’;
i.e. the condition holds for all sets. Intuitively, sets are intended to be those
classes that are adequate for mathematics, and proper classes are thought
of as those collections that are ‘so big’ that, if they were permitted to be
sets, contradictions would follow. In the Neumann-Bernays-Gödel axioms,
the classical paradoxes are avoided. This can be proven by showing in each
case that the collection on which the paradox is based is a proper class, i.e. is
not a set.
The axioms 1, 3, 9 and 10 are different from ZF. The third axiom (scheme)
is presented in a form to facilitate a comparison with the third axiom (scheme)
of ZF. In a detailed development of NBG, however, there appears, instead,
a list of seven axioms (not schemes) that for each of certain conditions there
exists a corresponding class of all those sets satisfying the condition. From
this finite set of axioms, each instance of the above scheme, can be obtained
as a theorem. When obtained in this way, the third axiom scheme of NBG
is called the class existence theorem.
Theorem: Any theorem of NBG that speaks only about sets is a theorem
of ZF
Note that the fact that NBG avoids the classical paradoxes and that
there is no apparent way to derive any one of them in ZF does not settle the
question of the consistency of either theory. All we know from this theorem
is that either both axioms are consistent, or both are inconsistent.
140 CHAPTER 8. GÖDEL
Chapter 9
141
142 CHAPTER 9. CHURCH AND TURING
systems) about numbers, indubitably true, which could not be proved from
finitely many rules. But the decidability of mathematical statements was
not settled by Gödels theorem because it needs a formal definition of (al-
gorithmic) method in the formulation of the problem (or a definition of the
notion of algorithm in the definition of decidability in section 6.1). Thereto
Turing introduced a machine that was later to be called the Turing machine,
an idealized mathematical model that reduces the logical structure of any
computing device to its essentials. By extrapolating the essential features of
information processing, Turing was instrumented in the development of the
modern digital computer. His model served as a basis for all subsequent digi-
tal computers, which share his basic scheme of an input/output device (tape
and head), memory (tape) and central processing unit (head and transition
function).
The Turing Machine model uses an infinite tape as its unlimited memory,
and has a tape head that can read and write symbols (of a set Γ) and move
around a tape (to the L(eft) or R(ight)). We here assume the tape is right-
infinite; this means the tape continues infinitely to the right side but it has
a left-most position. Initially the tape contains an input string of symbols
from an input alphabet Σ and is blank (i.e. filled with a special blank symbol
") everywhere else. The Turing Machine is in a state q of a set of states Q,
and starts in an initial state q0 . It uses a transition function δ that deter-
mines how it gets from one configuration (that is the current state, the tape
contents and the head location) to the next. This transition can consist of
writing a new symbol of the tape alphabet Γ to the tape and moving the tape
head either Left or Right, and depends on the current state and the current
symbol on tape. This computation (i.e. sequence of transitions) continues
until the Turing Machine enters either the (final) state qaccept or the (final)
state qreject . We can define a Turing Machine (sometimes called determin-
istic, since each transition is determined uniquely given the configuration)
formally as a septuple:
144 CHAPTER 9. CHURCH AND TURING
After defining the Turing Machine, Turing made his famous proposal
(known as Turing’s thesis, see also section 9.3) for the concept of ‘com-
putability by a Turing machine’. The proposal says that whenever there
is an effective method for obtaining the values of a mathematical function
(i.e. it is intuitively or effectively computable), the function can be computed
by a Turing Machine. The converse claim is trivial, and if the thesis is correct
we can reduce problems of (non-)existence of effective methods by problems
of the (non-)existence of Turing Machine problems. We quote one of Turing’s
formulations from [90]:
3 Ck is an accepting configuration.
f (x, y) = x + y if x ≥ y
f (x, y) = 0 if x < y
To add the two numbers a and b, we only have to remove the separating
0, so addition amounts to the concatenation of two strings. The following
Turing Machine, called Adder, adds a and b and is constructed relatively
simple:
Adder = (Q, Σ, Γ, δ, q0 , qA , qR ), with
Q = {q0 , q1 , . . . , q4 }
Σ = {0, 1}
Γ = {0, 1, "}
q0 = {q0 }
qA = {q4 }
qR = {}
δ(q0 , 1) = (q0 , 1, R)
9.1. TURING AND TURING MACHINE 147
δ(q0 , 0) = (q1 , 1, R)
δ(q1 , 1) = (q1 , 1, R)
δ(q2 , 1) = (q3 , 0, L)
δ(q3 , 1) = (q3 , 1, L)
Comparison
To compare two numbers a and b, we again assume they are written in the
notation that we used before and divided by a single ‘0’. We will construct
a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting
state if a < b. Thereto we can match each ‘1’ on the left of the dividing
‘0’ with a ‘1’ on the right. We can do this by starting at the leftmost ‘1’
(of the number a) and interchangeably check off the leftmost symbols of the
numbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively.
The matching will stop when one of the two sequences of ‘1’s is completely
checked off. If x < y then the right sequences will still contain ‘1’s, and
if x ≥ y either the left sequence contains ‘1’s or neither sequence contains
‘1’s. In the first case, we still find a ‘1’ on the right when all ‘1’s on the left
have been replaced. We use this to get into the state q5 . In the second case,
if a ≥ b, when we attempt to match another ‘1’, we encounter a blank at
the right of the working space, which can be used as a signal to enter the
accepting state. If we work this out in detail, we get the following Turing
Machine called Comparer :=
(Q, Σ, Γ, δ, q0 , qA , qR ), with:
148 CHAPTER 9. CHURCH AND TURING
Q = {q0 , q1 , q2 , q3 , q4 , q5 , q6 , q7 }
Σ = {0, 1}
Γ = {0, 1, x, y, "}
q0 = {q0 }
qA = {q5 }
qR = {q7 }
δ(q0 , 1) = (q1 , x, R)
δ(q1 , 1) = (q1 , 1, R)
δ(q1 , 0) = (q2 , 0, R)
δ(q2 , y) = (q2 , y, R)
δ(q2 , 1) = (q3 , y, L)
This set replaces the leftmost ‘1’ of a with ‘x’, then causes the read-write
head to travel right to the first ‘1’ of b and replace it with the symbol ‘y’.
When the dividing ‘0’ is passed, the machine enters state q2 , indicating that it
is now dealing with the number b. When the symbol ‘y’ has been written, the
machine enters a state q3 , indicating that on ‘1’ of ‘y’ has been successfully
paired with a ‘1’ of ‘x’. The next group of transitions reverses the direction
and repositions the read-write head over the leftmost ‘1’ of a, and returns
control to the initial state,
δ(q3 , y) = (q3 , y, L)
δ(q3 , 0) = (q4 , 0, L)
δ(q4 , 1) = (q4 , 1, L)
δ(q4 , x) = (q0 , x, R)
9.1. TURING AND TURING MACHINE 149
The rewriting continues this way when the input is a string 1x 01y , stopping
only when on one side no more ‘1’s can be replaced. In that case either the
left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of
‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transition
δ(q4 , x) = (q0 , x, R) will leave the read-write head on a ‘0’ in stead of a ‘1’.
δ(q0 , 0) = (q5 , x, L) (a ≤ b)
In the first case we still have to check whether the right side has any ‘1’s left,
to determine whether a = b. This is done in the state q5 .
δ(q5 , x) = (q5 , x, R)
δ(q5 , 0) = (q5 , 0, R)
δ(q5 , y) = (q5 , y, R)
We now have to put together the Turing Machines’ Adder and Comparer
to obtain the desired Turing Machine that computes the given function. We
can do this by starting with the input a and b in the previously described
notation and starting position, and using Comparer to determine whether
or not a ≥ b. We index all states with a C, i.e. the last transition will be
δ(qC,0 , x) = (qC,5 , x, L) or δ(qC,2 , ") = δ(qC,6 , ", L). In the first case (a ≥ b),
the Comparer should send a ‘start signal’ to the Adder, to give a + b as out-
put. In the second case (a < b), the Comparer should send a ‘start signal’
to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s to
output the value 0 in the desired format.
We show how we can let the Comparer send these ‘start signals’. We first
index all states of the Adder by A and of the Eraser by E. Now in case of
a ≥ b, Comparer ends in state qC,5 , and we can add a transition δ(qC,5 , ∗) =
δ(qA,0 , ∗). The star ‘∗’ stands for any possible symbol, so actually this tran-
sition is a shorthand notation for a set of transitions. Similarly, we can let
150 CHAPTER 9. CHURCH AND TURING
δ(qC,7 , ∗) = δ(qE,0 , ∗) bring the Eraser in the initial state. The Adder respec-
tively Eraser will then give the desired output because their behavior on the
input does not change as a result of the remaining of the states by comparer
(to be exact: the state in which the comparer terminates is suitable as an
initial position for Adder or Eraser). The only thing we have not taken care
of is that when the Comparer enters a final state, it does not have the initial
representation of the numbers a and b on tape, but has replaced the ‘1’s by
‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an
exercise if you want) fix this by letting Comparer, as the last action before
entering a final state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing
Machine that combines Comparer, Adder and Eraser to compute the func-
tion f . Similarly to this example, we can for example multiply two numbers
a and b, and we can also translate macro-instructions like ‘if p then qj else
qk ’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes
into a state qj and otherwise into a state qk ), and even combine them into
complicated subprograms that can be invoked repeatedly whenever needed.
(End of Example)
The Entscheidungsproblem
Theorem: H is recognizable
Proof (by Turing): The following TM U , also called Universal Turing Ma-
chine because it is capable of simulating any other Turing Machine, recog-
nizes H. We informally define U , because a detailed definition of the septuple
such a TM consists of (see the definition of a TM) is a lot of work.
- Alan Turing
In this section I have made extensive use of [38] [92] for information on
the life and work of Turing and [89] [82] [19] for the theory of TM’s and the
Halting problem. Another valuable source of information on Turing’s life and
work is the website http://www.turing.org.uk/
9.2. CHURCH AND THE LAMBDA CALCULUS 153
1
Note that in some examples we have simplified the notation for the clarity of the
example, since in pure lambda calculus we do not have arithmetic symbols, like + and ×,
but we can encode these operations in the pure lambda calculus, as we will later see.
154 CHAPTER 9. CHURCH AND TURING
These two notions can be very powerful if we introduce the rule of beta
reduction which allows us to apply an expression over an abstraction, and for
example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . n×n) 7 can be reduced
to 7×7. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)×n) 7) 4
can be reduced to (λx . (x + 1) × 7) 4 and then to (4 + 1) × 7.
Similar to ordinary mathematics, the names of the variables are irrele-
vant to the rules that can be applied, which allows a transformation of the
names (also known as dummy transformation). This rule in lambda calculus
is called alpha conversion. For example, alpha conversion allows us to rewrite
λn . nn to λx . xx, since they are essentially the same function.
The above explanation and examples give an idea of what lambda calcu-
lus is. We will now work towards a more formal definition of lambda calculus.
The system of lambda calculus is based on the structure of Abstract Reduc-
tion Systems (ARS). The terms of the ARS then coincide with the inductively
defined lambda terms and the reduction relation will be β−reduction. So be-
fore we formally define the lambda calculus, we introduce the most relevant
theory of abstract reduction systems.
Lemma An ARS < A, →> with the unique normal form property is not
always weakly normalizing.
Proof: For instance, the abstract reduction system with only element a ∈ A
and rewrite rule a → a has no normal forms, so it trivially has the unique
normal form property and is not weakly normalizing.
Lemma If a reduction relation has the unique normal form property and is
weakly normalizing then it is confluent.
Proof: Suppose we have a b and a c. Since → is weakly normalizing,
there are normal forms b and c such that b b and c c . By transitivity
we also have a b and a c , and thus by the unique normal form property
b ≡ c . Hence b b and c b .
Syntax
Now we have seen the basic principle of lambda calculus, we will give a
more formal definition. We formally define the syntax of the lambda calculus
by giving its grammar.
9.2. CHURCH AND THE LAMBDA CALCULUS 157
• F V (C) = ∅,
• F V (v) = {v},
α-conversion
From now on, two λ-terms are considered (syntactically) equal if they are
α-convertible to each other.
Substitution
• C[E/v] ≡ C
E if x ≡ v
• x[E/v] ≡
x if x
≡ v
(λy . yy)TWICE
→β TWICE TWICE
→β λx . TWICE (TWICE x)
→β ...
Example:
Although we already saw that λ-calculus is neither weakly nor strongly nor-
malizing, it does have the important confluence property. First we introduce
the following definition of the diamond property that we use to prove that
→β is confluent. To prevent confusion in the notation we will from now on
also use the implication symbol ⇒.
We now need to apply a technique called induction loading (see for more
information the links on http://zax.mine.nu/stage/) to prove that K and L
have a common reduct N. To be precise, we show that l(m, n) holds for all
m, n ∈ N, with
(a) N (i, 0) ≡ Mi if 0 ≤ i ≤ n
(b) N (0, j) ≡ Kj if 0 ≤ j ≤ m
Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have
a common reduct. So the only remaining proof obligation is to show that
l(m, n) holds for all m, n ∈ N. We prove this by induction to n.
Base case (n): n=0
Note that this is valid in combination with the definition under (a)
since N (0, 0) ≡ M0 ≡ M ≡ K0 .
2
The lines of the proof are due to W. Tait and P. Martin-Löf (see [6], section 3.2]), but
as far as I know this is the first proof that formalized the above lemma to a reasonable
extent.
164 CHAPTER 9. CHURCH AND TURING
Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . ..
Church’s thesis (1936) The set of effectively computable functions, i.e. functions
that intuitively (effectively) can be computed, is the same as the set of
functions that can be defined in λ-calculus.
Alan Turing proved in 1937 that the class of Turing computable functions is
the same as the class of functions definable in λ-calculus.
Church also presented in [14] a formally exact way to express this no-
tion of intuitively computable. Turing’s method was however more obvious
and more general than Church’s, since the latter only considered functions
of positive integers. In order to calculate the values of the function Church
introduced his lambda calculus and specified the notion of a recursive func-
tion (see section 9.2).
Although Turing and Church had chosen different ways to formalize the
intuitive notion of effective computability, respectively by identifying the no-
tion with that of computability by a Turing Machine and in the lambda cal-
culus, both methods are equivalent. After this proof of equivalence, Kleene
introduced the term ‘Church-Turing thesis’ to refer to any of the two equiv-
alent theses ([48], page 232).
Conclusion
- A. Whitehead, in [99]
169
170 CHAPTER 10. CONCLUSION
present the theory properly. Hopefully that makes it more clear and enjoy-
able. Some of the good literature used, such as the books just mentioned,
will be found in the references at the end of this report.
At the same time, I also tried to briefly introduce the reader to the his-
torical context of the most important developments. Most undergraduate
courses I have taken gave little or no information about the history that is
laying behind the theory. Emphasis was laid on the accumulation of mathe-
matical knowledge. I believe that the history of mathematics in education
can not only make the study of mathematics more interesting, but also help
in the growth of mathematical understanding and appreciation of the current
form of the theory.
I want to conclude this report with a summary of the theory and my own
view on the project, and with some ideas for future work.
The project
In the beginning of the 20th century Hilbert said we should formalize all
of mathematics, mathematical reasoning. This ‘project’ (from now on I will
refer to it as the project) has been the central theme of this report. When
reading about the work and biographies of all those brilliant men that have
put themselves on this problem, you can (at least that’s what happened to
me) get caught up into this fascinating philosophical question.
To most people however, this all seems very impractical. We all know
you can make a popular operating system or start your own business on the
web and in one year make a million dollars if you’re lucky. And when it
comes to verifying mathematical proofs and making reliable software, a for-
mal basis is rarely used, the human mind is still the most important, and
other techniques, such as model-checking, are preferred. It might be worth
writing another article, on how and why in that respect the more practical,
working mathematicians and more theoretical logicians (or formalists, if you
prefer) grew apart. But let’s first go back to the project.
At first this was a shock, but then mathematicians were saying (and
again it would be nice to write an article about the different responses of
mathematicians and logicians): so what - we should do mathematics exactly
the same way as we’ve always done it, this does not apply to the problems
I care about. Indeed mathematicians continued with their work, and the
theorems of Gödel and Turing had no or little impact in practice on how
we (should) do mathematics. The only effect the project might have had on
working mathematicians, is that they have become a bit more precise in the
use of language and in writing their proofs. Some of course were inspired
by problems like the 23 of Hilbert. But there has been another consequence
of all this theoretical work, that I was made aware of through a videotaped
lecture of G.J. Chaitin on the internet. I quote him about Hilbert’s attempt
to formalize all mathematics after the publications of the theorems of Gödel
and Turing: “It failed in that precise technical sense. But in fact it succeeded
magnificently, not formalization of reasoning, but formalization of algorithms
has been the great technological success of our time - computer programming
languages! So if you look at the history of the beginning of this century you’ll
see papers by logicians studying the foundations of mathematics in which
they had predicate calculi. Now you look back and you say this is clearly
a programming language! [...] If you look at Turing’s paper of course there
is a machine language [...]. Or, as von Neumann said: the universal Turing
Machine is really the notion of a general purpose programmable computer -
and that’s the idea of software. [...] If you look at papers by Alonzo Church
you see the lambda calculus, which is a functional programming language.
If you look at Gödel’s original paper you see what looks like LISP, it’s very
close to LISP”. As he showed there are numerous examples of unexpected
offspring of theoretical research, and all of the foundational work is not so
impractical after all! As G.J. Chaitin concluded in his speech, this is the
way “we’re all benefiting from the glorious failure of this project!”. Now
this is not entirely true, but it is true that theoretical studies, as he says
“don’t have spin-off in dollars right away, but sometimes they have vastly
unexpected consequences”. Formal methods/studies have not always done a
good job promoting themselves - maybe we can emphasize this aspect and
show that technology often advances through fascinating impractical ideas.
173
3
As interesting statements, we consider all statements in the (everyday) work of prac-
ticing mathematicians. These ‘practical’ statements do not include the specific purely
theoretical statements that Gödel invented for his incompleteness theorem.
174 CHAPTER 10. CONCLUSION
statements that the system does contain, and which we claim to be decida-
ble by providing a concrete and completely formalized (dis)proof of it within
that system, we still have a way to decide mechanically whether or not the
proof is correct for the given statement. The question then is if the set of
statements for which we can do this, still forms a part of mathematics that
is interesting enough. This has to be a part of our investigation: to find out
how many of the practical mathematical proofs contain ‘meta-arguments’, in
other words which classes will fall outside our system. Although we want to
change as little as possible to the (side of) mathematics itself, this also might
be a necessary option4 . As P. Andrews calls his book [4], we get: ‘to truth
through proof’. This should be the first goal for the near future:
I would also like to remark that proof checking for programs can only give
us a way to verify the correctness of programs. At least as important (to ob-
tain correct programs) is the correct construction of programs. This is the
focus of the work in the area of programming methodology. At the Eindhoven
University of Technology for example, the techniques of E.W. Dijkstra are
used to derive correct programs from their specification. Unfortunately both
areas (proof checking/verification vs. construction/derivation) are merely ad-
vocates of their own approach, while a combination of both could give the
best results. Although there has been some minor work on formalizing these
proof techniques and combining formal methods and program derivations
(see for example [26]), cooperation is still minimal. If we go one step further
back in the process of creating correct software, the success of any piece of
software depends on the correctness of its specification. These first phases of
software engineering (indicating user requirements/specifications) can also be
adopted to comply with the methods of program derivation and formal proof
checkers (note that we not only use the term ‘proof checker’ for mathematics,
i.e. to check mathematical statements, but also for the software variant: for
checking algorithms/programs derivations). And since we can never obtain
a 100% guarantee of correctness of software (it depends for example on the
correctness of the specifications and the proof checker itself), model checking
techniques can also be used as a verification method to improve reliability
even further. Therefore I stress for an integrated approach, for the combina-
tion of all of the mentioned methods can only together give us the highest
reliability (i.e. highest chance of correctness of software). Such an integrated
approach requires research and cooperation between the various branches
176 CHAPTER 10. CONCLUSION
Weak Type Theory. This is a start of a more rigorous approach to the trans-
lation of mathematical texts (statements and proofs).
We see the extension of proof assistants with more intelligent and sophis-
ticated automated proving methods, as the last and final phase (4) of future
work. Part of the branch of automated proving are classical theorem proving
methods (such as for example automated induction, etc.). New methods are
from areas such as neural networks, fuzzy logic and genetic and DNA com-
puting and in the future possibly even quantum computing.
I want to end these ideas by summarizing the steps that are laying ahead
of us, in a new project.
One of the most important questions, part of step (1), has so far in this
conclusion been avoided: What to take for the basis of mathematics? This is
one of the most difficult questions and as we have seen many great scientists
have thought about this. There is currently no consensus of what is the best
approach, and I am not in the position to give an argumented opinion. A
thorough research of the alternatives will have to yield the best approach and
will show which choice of foundational system is best usable in practice.
The only thing I can say is that it seems that recently most people seem to
favor type theory over category theory, relational calculi and also over set
theory. P.J. Scott for example favors type theory over category theory in
the introduction of [55]. H. Barendregt gives arguments for the use of type
theory over set theory in [7], and we quote from [4, the second page of the
preface]: “[People prefer the approach they are most familiar with.] However,
those familiar with both type theory and axiomatic set theory recognize that
in some ways the former provides a more natural vehicle than the latter for
formalizing what mathematicians actually do”. On the contrary, on http://-
www.rbjones.com/rbjpuc/logic/jrh0111.htm we find a detailed assessment
on the choice for a foundational system, with advantages of set theory over
type theory. Also, several new types of logic have been proposed, such as IF
logic (see [37]) and several types of so-called ‘fuzzy logics’, but until so far
it seems they lack preciness, formalization and proofs to support claims that
they can be used successfully as a foundation for mathematics.
A final remark on the debate between type theory and axiomatic set theory
as a foundational basis, is that if there is a mapping from the axioms of
(some form of) set theory in (some form of) type theory and vice versa, type
theoretic expressions have their counterparts in set theory. It is interesting to
investigate if among such mappings there is indeed a bijection. That would
show the equivalence of both theories in expressive power, so that the debate
can turn onto the question which theory is more intuitive and useful.
Some do not really believe in a successful formalization of mathematics but
rather see the indeterminacies in mathematical representations and the un-
decidabilities in any formal system as the source of problem solving and
creative power (see [87, page 174]). This standpoint was already mentioned
in 1807 by the German mathematician Hegel (1770-1831) in [35]: “Dagegen
muß behauptet werden, daß die Wahrheit nicht ein ausgeprägte Münze ist,
die fertig gegeben und so angestrichen werden kann”.
179
I am aware of the limitations of this report. Many chapters are still infor-
mal, such as the work of Frege in chapter 4. The theory of types in chapter
7 and of Gödels incomepleteness theorem in chapter 8 are not completely
covered and certain subjects closer to logic (such as intuitionism) are treated
very minimally. The only excuse I have is that it is simply not possible to
study all the original works in such a short period of time, and include all
theory in this report. I hope to complete this work at a later stage. It might
also be worth to extend (on both sides) the period of which the theory is
treated in this report. Recently we have seen interesting new theories on
category and type theory and even on the foundations of mathematics, as
we look at Chaitin’s results on randomness; it seems that he went further
where Gödel and Turing left off. Finally I would like to remark that the ‘new
project’, consisting of the four steps mentioned in this conclusion, is just my
own view of work that lays ahead of us. To end with a concluding remark
by Alan Turing, from his paper on the Turing test: “We can only see a short
distance ahead, but we can see plenty there that needs to be done”.
5
p.s. To those who wonder what the turtle and the elephant are doing on the cover of
this report, I refer to the website http://zax.mine.nu/stage/.
180 CHAPTER 10. CONCLUSION
Appendix A
181
182 APPENDIX A. TIMELINE AND IMAGES
[6] H. Barendregt. The Lambda Calculus - Its Syntax and Semantics, vol-
ume 103. Elsevier Science Publishing Company, Inc., 1984.
187
188 BIBLIOGRAPHY
[15] P.J. Cohen. Set Theory and the Continuum Hypothesis. Benjamin,
1966.
[17] H.C. Doets D. van Dalen and H. de Swart. Sets: Naive, Axiomatic and
Applied. Pergamon Press, 1978.
[18] J.W. Dauben. Georg Cantor, His Mathematics and Philosophy of the
Infinite. Harvard University Press, 1979.
[21] A. Einstein. Relativity: the special and general theory. Methuen Press,
London, 1970.
[34] P.R. Halmos. Naive Set Theory. Van Nostrand Press, London, 1990.
[55] J. Lambek and P.J. Scott. Introduction to higher order logic. Cambridge
University Press, 2001.
[59] Mosché Machover. Set theory, logic and their limitations. Cambridge
University Press, 1996.
[63] G.H. Moore. Zermelo’s axiom of choice: it’s origins, development and
influence. Springer-Verlag, 1982.
[64] E. Nagel and J. R. Newman. Gödel’s proof. New York University Press,
1986. First published in 1958.
[72] W. Van Orman Quine. Set Theory and its Logic. Harvard University
Press, 1963. Cambridge, Massachusetts.
[77] H. Rubin and J.E. Rubin. Equivalents of the axiom of choice. North-
Holland Press, Amsterdam, 1963.
[94] W. van Orman Quine. New foundations for Mathematical Logic. The
American Monthly, February 1937. 44(2), pages 70-80.
[96] J. von Neumann. Zur Einfurung der transfiniten Zahlen. Acta Szeged.
1:199-208 [I, 3], 1923.