You are on page 1of 29

Algorithms

Matei Popovici
October 24, 2014

Introduction

1.1
1.1.1

The Apollo example


The story

On the 11th of April 1970, the Apollo 13 mission is taking off, with the
intention of landing on the Moon. Two days into the mission, an explosion
in the Service Module forces the crew to abort mission and attempt a return
to Earth. Facing extreme hardship, especially due to loss of power, the crew
finally succeeds in returning to Earth. Both the crew and the support team
from Space Center were forced to find fast and simple solutions to apparently
insurmontable problems1 .
Drawing inspiration from this scenario, we consider the following inspirational example.
The space mission S takes off having Moon as destination. Some time
after take-off, S notices a malfunction in the main engine, which fails to
respond to command inputs. Trying to hack-control the engine the
following circuit is identified:
(A B D) (B C D) (A C D) (A BD)
(B C D) (A CD) (A B C) (A B C)
If can be made to output 1, then the engine could be manually started. S
however notices that, with no apparent input for A, B, C and D, does the
circuit yield 1. S requires advice from Control, as how to proceed in this
situation.
1

One example is the famous improvisation which made the Command Modules square
CO2 filters operable in the Lunar Module, which required such filters of round shape.

The solution which Control must prompt is dependent on some key


background information:
(i) The position of S allows only 5 minutes of window for a decision to be
made. After 5 minutes, a safe mission abort is no longer possible.2
(ii) Trying to find the solution by hand is unfeasible and likely to produce errors.
(iii) The actual problem to be solved consists in finding a certain input
I, which assigns to each of A . . . D a value from {0, 1}, such that the
circuit yields 1 (which we denote as (I) = 1). We call this problem
S().
(iv) To solve S, one needs to find an algorithm for computing (I), given
an input I.
Computing (I). It is helpful to examine the structure of first, which
shows certain regularity. is of the form:
(L1 . . . L1n1 ) . . . (Lk . . . Lknk ))
where each (Li . . . Lini ) is a clause consisting of ni literals, and each literal
Lij is of the form X or X, where X is a variable (input).
Next, we identify a suitable representation of which our algorithm can
benefit. Assume I is represented as a vector, which, for convenience, is
indexed by variable names instead of 0 . . . n1 (n is the number of variables).
Assume is represented as a matrix, where each column [i] represents a
vector of literals. The value [i, X] = 1 indicates that X appears as itself
in clause i, [i, X] = 0 indicates that X appears in a literal X, while
[i, X] = indicates that X is not present in clause [i].
The algorithm to solve our problem, entitled CHECK(, I), proceeds as
follows:
a) Set v = 0
b) for i = 1, k:
re-set v = 0
2

This is highly similar to the real situation of Apollo 13: given their position and
malfunctions made a direct abort and return to Earth impossible. Instead, their return
trajectory involved entering Moon orbit.

for each variable X from I:


set v = v + [i, X] I[X]
if v = 0 stop and return 0; otherwise, continue;
c) return 1
The operation is a customized XNOR, which behaves as follows: a b
is always 0 if either of a or b is . Hence, if a variable X does not appear
in a clause i, then [i, X] = will not influence the value of v. a b is 1 if
both operands are the same. Hence [i, X] I[X] = 1 if X occurs as itself
in clause i and is given value 1 in I, or if X occurs as X in clause i and is
given value 0 in I.
At the end of the inner loop, CHECK will have computed the value of
some clause (Li . . . Lini ), which is 1 if at least some literal is 1.
If, some clause has the computed value of 0, then is 0 for the input I.
Finally, note that CHECK performs at most n k computations, where
n is the number of variables, and k is the number of clauses in .
Computing S() To solve S, it is sufficient to take all possible inputs I,
up to the number of variables which occur in , and for each one, perform
CHECK(, I).
This can be achieved by viewing the vector I as a binary counter. Hence,
the operation I ++ is implemented as follows, where variables X are iterated
in some arbitrary order, but fixed with respect to the algorithm:
a) for each variable X in I:
if I[X] = 1 make I[X] = 0 and continue;
otherwise make I[X] = 1 and stop;
b) overflow;
The operation I + + is said to overflow, if instruction b) is reached, namely
if all variables are set to 0 upon a traversal of all variables.
Now, we implement FIND():
a) Set I to be the input where all variables are set to 0;
b) while overflow was not signalled:
if CHECK(, I)=1 then return 1
otherwise I++;
3

c) return 0
We now have a procedure for solving S, one which Control may use in
order to assist the Mission. However, several issues arise:
(V) is FIND() correct, that is, bug-free, and returning an appropriate
result ?
(M) how long does FIND() take to run, with respect to the size of the
input (i.e. the size of the matrix ) ?
(C) is FIND() an optimal algorithm? Are there ones which perform better?
Next, we discuss each issue in more detail: (V) denotes the Verification
problem. Verification of computer programs (that is, of algorithm implementations) is a essential nowadays, as more and more we rely on software to
perform critical tasks: assist us in driving cars and flying planes, guiding
missiles and performing surgeries. Verification is equally important for the
mission.
(M) identifies a measurement problem. Namely, in what way Critical
resources are consumed by an algorithm. Traditionally, these resources
are time, i.e. number of CPU operations and space i.e. computer memory.
We note that, for our given formula, CHECK(, I) will run at most 84 = 32
CPU operations, where 8 is the number of clauses, and 4 is the number of
variables from any input. FIND() will build at most 2n assignments I,
where n is the number of variables. In our example, n = 4, thus we have 16
different ways of assigning A, B, C and D to values 0, 1. For each, we run
CHECK(, I), yielding a total of 16 32 = 512 CPU operations.
For the sake of our example, let us assume Missions computers ran
one CPU instruction per second, which is not unlikely, given the hardware
available in the 70s. Thus, in 5 minutes, we can perform: 5 60 = 300 CPU
operations ! Note that FIND() does not finish in sufficient time: if Control
would use this algorithm, it would waste Missions time.
Finally, (C) designates a complexity problem, one specific to Complexity Theory: Do efficient algorithms exist, for a given problem? .
There are also sub-questions which spawn from the former:
is there a better encoding of the input and the variables from I, which
makes the algorithm run faster?
is there a certain machine which allows performing computations in
a more efficient (fast) way?
4

Unfortunately, the problem which we denoted by S, is traditionally called


SAT: i.e. the Satisfiability problem for boolean formulae. It has been
known that efficient algorithms to solve SAT are unlikely to exist, no matter
the machine of choice. Translated into simple terms, in the general case, we
cant get substantially better than an exponential number of steps, with
respect to the formula input. If Control would have this knowledge, then it
will take no trouble in looking at specific algorithms. If would recognize the
problem to be impossible to solve in the given time constraints, and would
recommend Mission to return home.
To recap, (V), (M) and (C) play an important role in decision making,
both at the software level, and beyond. Naturally hard problems (re)occur
if every field of science. Having at least a minimal knowledge on their nature
is a critical step in attempting to solve them.
1.1.2

Disclamer

SAT is probably one of the most studied problems in Computer Science.


SAT solvers are key components of many algorithms which solve even harder
problems: program verification, code optimisation, cryptography, only to
name a few. While reading this, it is highly probable that the reader employs
some technology which relies on SAT solvers, directly or indirectly.
Are these solvers efficient, in the sense that, they run in less than exponential time? For the general case, the answer is no. However, these solvers
will run fast enough for most formulae. This doesnt really help Control,
unless they are lucky enough to stumble upon a nice formula.
The intuition behind SAT solvers is that they rely on an efficient graphlike structure to represent a formula. This structure is called an OBDD
(Ordered-Binary Decision Diagram). The efficiency of an OBDD is unfortunately dependent on finding a suitable ordering for the variables. The
latter, in itself is a hard problem. However, there are good heuristics which
come close. The overall result is that SAT solvers perform well in many
cases, and the many is measurable with controlled precision.

1.2

Which is harder?

Consider the following problems:


A telecommunications company T owns radios across the country, and
needs to monitor all links, to record performance. Monitoring each radio is
expensive, thus T would like a minimal number of k radios to be monitored.
Is this possible?
5

An organism consists of a number of cells, and connections between cells.


Some cells are good, while some are bad. The bad cells can be detected by
examining their links: they form a complete mesh: each cell is connected to
all others. Are there (k-) bad cells in the organism?
It is easy to notice that both problems can be cast into graph problems.
Let G = (V, E) be an unoriented graph. If we interpret each node as a radio,
and each edge as a radio link, then solving the former problem boils down
to finding a subset S V such that, for all (a, b) E, at least one of the
following holds: a S, b S. Hence, S covers all edges from E.
If we interpret nodes as cells, and edges as connections between cells,
then the latter problem consists in finding a subset S V (of size |S| = k)
such that (a, b) E for each a, b S. Hence S is a clique (of size k).
One interesting question which may be raised is whether one problem is
harder than the other. We note that a graph capturing the first problem
can be transformed into one capturing the latter, and vice-versa.
If we start from the telecommunications graph, we can build a cell
graph by: (i) creating once cell per radio (ii) if two radios do not share a
link, then the corresponding cells will share a connection. Thus, if some
subset S of radios covers all links, then all the cells corresponding to radios
outside S must share connections, hence they are bad.
We note that the transformation can be easily adjusted to work in the
other direction. Thus, if one has an algorithm to solve the Telecommunications problem, then, via a transformation, we can solve the cell problem,
and vice-versa.
This observation highlights several issues:
If such a transformation exists between two problems, what does it
say about the complexity of solving the problems?
Is it always the case that transformations are bijective?
What are the properties of the transformation, such that it is effective
(can be used for problem solving)? For instance, if some problem A
can be transformed to B in exponential time3 , and A can be solved
in polynomial time, then solving B via A yields an algorithm in exponential time.
Can (appropriate) transformations be used to characterize problems
which are equally hard?
3

with respect to the size of the input

In the previous section, we illustrated a problem which can be solved in


exponential time. We claimed, without arguing, that this problem cannot be
solved in polynomial time. Given some problem P , if we can appropriately
transform P into the former problem, we can also argue that P cannot be
solved in polynomial time. If this would be possible, an algorithm for P
could also solve our former problem.
Developing a formal methodology from the above intuition is a critical
tool for assessing problem hardness.

Computability Theory

2.1

Problems and problem instances

In the previous section, we have illustrated the problem SAT, as well as a


pseudocode which describes a solution in exponential time. We have seen
that such a solution is infeasible in practice, and also that no (predictible)
technological advance can help. The main question we asked (and also
answered) is whether there exists a faster procedure to solve SAT. (We
have conjectured that the answer is no, or, more precisely not likely.) To
generalize a little, we are interested in the following question:
Can a given problem Q be solved in efficient time?
For now, we go around the currently absent definition for efficient, and
note that such a question spawns another (which is actually more straightforward to ask):
Can a given problem Q be solved at all?
In this chapter, we shall focus on answering this question, first, and to do
so, we need to settle the following issues:
What exactly is a problem?
What does it mean to solve a problem?
Definition 2.1.1 (Abstract problem, problem instance) A problem instance is a mathematical object of which we ask a question and expect an
answer.
An (abstract) problem is a mapping P : I O where I is a set of
problem instances of which we ask the same question, and O is a set of
answers. P assigns to each problem instance i I the answer P (i) O.
7

It is often the case that the answers we seek are also mathematical objects. For instance, the vector sorting problem must be answered by a sorted
vector. However, many other problems prompt yes/no answers. Whenever
O = {0, 1} we say that P is a decision problem. Many other problems can
be cast into decision problems. The vector sorting problem may be seen
as a decision problem, if we simply ask whether the problem instance (i.e.
the vector) is sorted. The original sorting problem and its decision counterpart may not seem equivalent in terms of hardness. For instance, sorting
is solved in polynomial time n log n (using standard algorithms) which
deciding whether a vector is sorted can be done in linear time. We shall
see that, from the point of view of Complexity Theory, both problems are
equally hard.
Definition 2.1.1 may seem abstract and unusable for the following reason:
the set I is hard to characterize. One solution may be to assign types to
problem instances. For example graph may be a problem instance type.
However, such a choice forces us to reason about problems separately, based
on the type of their problem instances. Also types themselves are an infinite
set, which is also difficult to characterize.
Another approach is to level out problem instances, starting from the
following key observations: (i) each i I must be, in some sense finite.
For instance, vectors have a finite length, hence a finite number of elements.
Graphs (of which we ask our questions) also have a finite set of nodes, hence,
a finite set of edges, etc. (ii) I must be countable (but not necessarily finite).
For instance, the problem P : RR {0, 1} where P (x, y) returns 1 if x and
y are equal, has no sense fromthe point of computability
theory. Assume
we would like to answer P (, 2). Simply storing and 2, which takes
infinite space, is impossible on machines, and also takes us to point (i).
The observations suggest that problem instances can be represented via
a finite encoding, which may be assumed to be uniform over all possible
mathematical objects we consider.
Definition 2.1.2 (Encoding problem instances) Let be a finite set
whom we call alphabet. A one-letter word is a member of . A two-letter
word is any member of = 2 . For instance, if = {a, b . . .}, then
(a, a) 2 is a two-letter word. An i-letter word is a member of i . We
denote by:
= {} 2 . . . i . . .
the set of finite words which can be formed over .  is a special word which
we call empty word. Instead of writing, e.g. (a, b, b, a, a) for a 5-letter word,
we simply write abbaa. Concatenation of two words is defined as usual.
8

Remark 2.1.1 We shall henceforth consider that a problem P : I O


has the following property: if I is infinite, then I ' (I is isomorphic
with ). Thus, each problem instance i can be represented as a finite word
enc(i) , for some .
We shall postpone, for now, the question of choosing the appropriate
for our problem (the above remark simply states that such a must exist).
Definition 2.1.2 and Remark 2.1.1 can be easily recognized from practice. A programmer always employs the same predefined mechanisms of his
programming language (the available datatypes) to represent his program
inputs. Moreover, these objects ultimately become streams of bits, when
they are actually processed by the machine.
Making one step further, we can observe the following property of alphabets, which conforms with (ii):
Proposition 2.1.1 For any finite , is infinitely countable.
Proof: We show ' N. We build a bijective function h which assigns to
each word, a unique natural. We assign 0 to . Assume || = n. We assign
to each one-letter word, the numbers from 1 to n. Next, we assign to each
k 2-letter word w = w0 x the number n h(w0 ) + h(x). If n = 2 we easily
recognise that each binary word is assigned to his natural equivalent.

Hence, we have the following diagram:
i I enc(i) h(enc(i)) N
Hence, we can view a problem instance as a natural number, without
losing the ability to uniquely identify the instance at hand. Thus:
Definition 2.1.3 (Problem) A problem is a function f : N N. If some
n encodes a problem input, then f (n) encodes its answer. A decision problem is a function f : N {0, 1}.
To conclude: when trying to solve concrete problems, the encoding issue
is fundamental, and this is dependent on the type of problem instances we
tackle. From the perspective of Computability Theory which deals with
problems in general, the encoding is unessential, and can be abstracted
without loss of information by a natural number.

2.2

Algorithms as Turing Machines

Algorithms are usually described as pseudo-code, and intended as abstractions over concrete programming language operations. The level of abstraction is usually unspecified rigorously, and is decided in an ad-hoc manner by
the writer. From the authors experience, pseudo-code is often dependent
on (some future) implementation, and only abstracts from language syntax, possibly including data initialization and subsequent handling. Thus,
some pseudo-code can be easily implemented in different languages to the
extent to which the languages are the same, or at least follow the same
programming principles.
The above observation is not intended as a criticism towards pseudocode and pseudo-code writing. It is indeed difficult, for instance, to write
pseudocode which does not seem vague, and which can be naturally implemented in an imperative language (using assignments and iterations) as
well as in a purely functional language (where iterations are possible only
through recursion).
As before, we require a means for leveling out different programming
styles and programming languages, in order to come up with a uniform,
straightforward and simple definition for an algorithm.
The key observation here is that programming languages, especially the
newest and most popular, are quite restrictive w.r.t. to what the programmer can do. This may seem counter-intuitive at first. Consider typed languages for instance. Enforcing each variable to have a type is obviously
a restriction, and has a definite purpose: it helps the programmer write
cleaner code, and one which is less likely to crash at runtime. However, this
issue is irrelevant from the point of view of Computability Theory. If we try
to search for less restrictive languages, we find the assembly languages. The
restrictions are minimal here (as well as the programming structure).
The formal definition for an algorithm which we propose, can be seen
as an abstract assembly language, where all technical aspects are put aside.
We call such a language the Turing Machine.
Definition 2.2.1 (Deterministic Turing Machine) A Deterministic Turing Machine (abbreviated DTM) is a tuple M = (K, F, , , s0 ) where:
= {a, b, c, . . .} is a finite set of symbols which we call alphabet;
K is a set of states, and F K is a set of accepting/final states;
: K K {L, H, R} is a transition function which assigns
to each state s K and c the triple (s, c) = (s0 , c0 , pos).
10

s0 K is an initial state.
The Turing Machine has a tape which contains infinite cells in both directions, and on each tape cell we have a symbol from . The Turing Machine
has a tape head, which is able to read the symbol from the current cell. Also,
the Turing Machine is always in a given state. Initially (before the machine
has started) the state is s0 . From a given state s, the Turing Machine reads
the symbol c from the current cell, and performs a transition. The transition
is given by (s, c) = (s0 , c0 , pos). Performing the transition means that the
TM moves from state s to s0 , overrides the symbol c with c0 on the tape cell
and: (i) if pos = L moves the tape head on the next cell to the left, (ii) if
pos = R moves the tape head on the next cell to the right and (iii) pos = H
leaves tape head on the current cell.
The Turing Machine will perform transitions according to .
Whenever the TM reaches an accepting/final state, we say it halts. If
a TM reaches a non-accepting state where no other transitions are possible,
we say it clings/hangs.
the input of a Turing Machine is a finite word which is contained in
its otherwise empty tape;
the output of a TM is the contents of the tape (not including empty
cells) after the Machine has halted. We also write M (w) to refer to
the output of M , given input w.
Example 2.2.1 (Turing Machine) Consider the alphabet = {#, >, 0, 1},
the set of states K = {s0 , s1 , s2 }, the set of final states F = {s2 } and the
transition function:
(s0 , 0) = (s0 , 0, R) (s0 , 1) = (s0 , 1, R)
(s0 , #) = (s1 , #, L) (s1 , 1) = (s1 , 0, L)
(s1 , 0) = (s2 , 1, H) (s1 , >) = (s2 , 1, H)
The Turing Machine M = (K, F, , , s0 ) reads a number encoded in binary
on the tape, and increments it by 1. The symbol # encodes the empty cell
tape.4 Initially, the tape head is positioned at the most significant bit of the
number. The Machine first goes over all bits, from left to right. When the
first empty cell is detected, the machine goes into state s1 , and starts flipping
1s to 0s, until the first 0 (or the initial position, marked by >) is detected.
Finally, the machine places 1 on this current cell, and enters its final state.
4

We shall use # to refer to the empty cell, throughout the text

11

c/c, R with c {0, 1}

start

s0

1/0, L

#/#, L

s1

0/1, H
> /1, H

s2

Figure 2.2.1: The binary increment Turing Machine


The behaviour of the transition function can be more intuitively represented as in Figure 2.2.1. Each node represents a state, and each edge, a
transition. The label on each edge is of the form c/c0 , pos where c is the
symbol read from the current tape cell, c0 is the symbol written on the current tape cell and pos is a tape head position. The label should be read as:
the machine replaces c with c0 on the current cell tape and moves in the
direction indicated by pos.
Let us consider that, initially, on the tape we have >0111 the representation of the number 7. The evolution of the tape is shown below. Each
line shows the TM configuration at step i, that is, the tape and current state,
after transition i. For convenience, we have chosen to show two empty cells
in each direction, only. Also, the underline indicates the position of the tape
head.
Transition no
Tape
Current state
0
##>0111##
s0
1
##>0111##
s0
s0
2
##>0111##
3
##>0111##
s0
s0
4
##>0111##
5
##>0111##
s1
s1
6
##>0110##
s1
7
##>0100##
8
##>0000##
s1
9
##>1000##
s2
In order to better understand the Turing Machine, it is useful to establish some similarities with, e.g. assembly languages. As specified in Definition 2.2.1, a Turing Machine is M specifies a clearly defined behaviour, which
is actually captured by . Thus, M is quite similar to a specific program,
performing a definite task. If programs (algorithms) are abstracted by Turing Machine, then what is the abstraction for the programming language?

12

The answer is, again, the Turing Machine. This implies that, a Turing Machine acting as a programming language, can be fed another Turing Machine
acting as program, and execute it.
In the following Proposition, we show how Turing Machines can be encoded as words:
Proposition 2.2.1 (TMs as words) Any Turing Machine M = (K, F, ,
, s0 ) can be encoded as a word over . We write enc(M ) to refer to this
word.
Proof:(sketch) Intuitively, we encode states and positions as integers n N,
transitions as pairs of integers, etc. and subsequently convert each integer
to its word counterpart in , cf. Proposition 2.1.1.
Let NonFin = |K \ F \ {s0 }| be the set of non-final states, excluding the initial one. We encode each state in NonFin as an integer in
{1, 2, . . . , |NonFin|} and each final state as an integer in
{|NonFin|+1, . . . , |NonFin|+|F |}. We encode the initial state s0 as |NonFin|+
|F | + 1, and L, H, R as |NonFin| + |F | + i with i {2, 3, 4}. Each integer
from the above is represented as a word using dlog|| (|NonFin| + |F | + 4)e
bits.
Each transition (s, c) = (s0 , c0 , pos) is encoded as:
enc(s)#c#enc(s0 )#c#enc(pos)
where enc() is the encoding described above. The entire is encoded a
sequence of encoded transitions, separated by #. The encoding of M is
enc(M ) = enc(|NonFin|)#enc(|F |)#enc()

Thus, enc(M ) is a word, which can be fed to another Turing Machine.
The latter should have the ability to execute (or to simulate) M . This is
indeed possible:
Proposition 2.2.2 (The Universal Turing Machine) There exists a TM
U which, for any TM M , and every word w , takes enc(M ) and w as
input and outputs 1 whenever M (w) = 1 and 0 whenever M (w) = 0. We
call U , the Universal Turing Machine, and say that U simulates M .
Proof: Let M be a TM and w = c1 c2 . . . cn be a word which is built from
the alphabet of M . We build the Universal Turing Machine U , as follows:

13

The input of U is enc(M )#enc(s0 )#c1 #c2 . . . cn . Note that enc(s0 )


encodes the initial state of M while c1 is the first symbol from w.
The portion of the tape enc(s0 )#c1 #c2 . . . cn will be used to mark the
current configuration of M , namely the current state of M (initially
s0 ), the contents of M s tape, and M s current head position. More
generally, this portion of the tape is of the form enc(si )#u#v, with
u, v b and si being the current state of M . The last symbol of u
marks the current symbol, while v is the word which is to the left of
the head. Initially, the current symbol is the first one, namely c1 .
U will scan the initial state of M , then it will move on the initial
symbol from w, and finally will move on the portion of enc(M ) were
transitions are encoded. Once a valid transition is found, it will execute
it:
1. U will change the initial state to the current one, according to
the transition;
2. U will change the original symbol in w according to the transition;
3. U will change the current symbol, according to pos, from the
transition;
U will repeat this process until an accepting state of M is detected,
or until no transition can be performed.

Propositions 2.2.2 and 2.2.1 show that TMs have the capability to characterize both algorithms, as well as the computational framework to execute
them. One question remains: what can TMs actually compute? Can they
be used to sort vectors, solve SAT, etc.? The answer, which is positive is
given by the following hypothesis:
Conjecture 2.2.1 (Church-Turing) Any problem which can be solved with
the Turing Machine is universally solvable.
The term universally solvable cannot be given a precise mathematical definition. We only know solvability w.r.t. concrete means, e.g. computers and
programming languages, etc. It can be (an has been) shown that the Turing Machine can solve any problem which known programming languages
can solve.5 The Turing Machine, in itself, describes a model of computation
5

To be fair to the TM, one would formulate this statement as: all programming
languages are Turing-complete, i.e. they can solve everything the TM can solve

14

based on side-effects: each transition may modify the tape in some way.
Computation can be described differently, for instance: as function application, or as term rewriting. However, all other known computational models
are equivalent to the Turing Machine, in the sense that they solve precisely
the same problems.
This observation prompted the aforementioned conjecture. It is strongly
believed to hold (as evidence suggests), but it cannot be formally proved.

2.3

Decidability

The existence of the Universal Turing Machine (U) inevitably leads to interesting questions. Assume M is a Turing Machine and w is a word. We use
the following convention: enc(M ) w in order to represent the input of the
U . Thus, U expects the encoding of a TM, followed by the special symbol ,
and M s input w. (?) Does U halt for all inputs? If the answer is positive,
then U can be used to tell whether any machine halts, for a given input.
We already have some reasons to believe we cannot answer positively to
(?), if we examine the proof of Proposition 2.2.2. Actually (?) is a decision
problem, one that is quite interesting and useful.
As before, we try to lift our setting to a more general one: Can any
problem be solved by some Turing Machine. The following propositions
indicate that its not likely the case:
Proposition 2.3.1 The set T M of Turing Machines is countably infinite.
Proof: The proof follows immediately from Proposition 2.2.1. Any Turing
Machine can be uniquely encoded by a string, hence the set of Turing Machines is isomorphic to a subset of , which in turn is countably infinite,
since is countably infinite, for any .

Proposition 2.3.2 The set Hom(N, N) of functions f : N N is uncountably infinite.
Proof: It is sufficient to show that Hom(N, {0, 1}) is uncountably infinite.
We build a proof by contraposition. We assume Hom(N, {0, 1}) is countably
infinite. Hence, each natural number n N corresponds to a function fn
Hom(N, {0, 1}). We build a matrix as follows: Columns describe functions
fn : n N. Rows describe inputs k N. Each matrix content mi,j is the
value of fj (i) (hence, the expected output for input i, from function fj ).

15

f0 f1 f2
0
1
1
0
1
0
1
1
2
1
0
1
... ... ... ...
n
1
1
0
... ... ... ...

. . . fn . . .
... 0 ...
... 0 ...
... 1 ...
... ... ...
... 1 ...
... ... ...

Figure 2.3.2: An example of the matrix of the proof of Proposition 2.3.2.


The value mi,j have been filled out purely for the illustration.
In the Figure 2.3, we have illustrated our matrix. We now devise a
problem f as follows:

1 iff fx (x) = 0

f (x) =
0 iff fx (x) = 1
Since f Hom(N, {0, 1}) it must also have a number assigned to it: f =
f for some N. Then f () = 1 if f () = 0. But f () = f ().
Contradiction. On the other hand f () = 0 if f () = 1. As before we
obtain a contradiction.

Propositions 2.3.2 and 2.3.1 tell us that there are infinitely more functions (decision problems) what means of computing them (Turing Machines).
Our next step is to look at solvable and unsolvable problems, and devise
a method for separating the first from the latter. In other words, we are
looking for a tool which allows us to identify those problems which are
solvable, and those which are not.
We start by observing that Turing Machines may never halt. We write
M (w) = to designate that M loops infinitely for input w. Also, we
write nw N to refer to the number which corresponds to w, according to
Proposition 2.1.1. Next, we refine the notion of problem solving:
Definition 2.3.1 (Decision, acceptance) Let M be a Turing Machine
and f Hom(N, {0, 1}).We say that:
M decides f , iff for all n N: M (w) = 1 whenever f (nw ) = 1 and
M (w) = 0 whenever f (nw ) = 0.
M accepts f iff for all n N: M (w) = 1 iff f (nw ) = 1, and M (w) =
iff f (n) = 0.
16

Note that, in contrast with acceptance, decision is, intuitively, a stronger


means of computing a function (i.e. solving a problem). In the latter case,
the TM at hand can provide both a yes and a no answer to any problem
instance, while in the former, the TM can only provide an answer of yes. If
the answer to the problem instance at hand is no, the TM will not halt.
Based on the two types of problem solving, we can classify problems
(functions) as follows:
Definition 2.3.2 Let f Hom(N, {0, 1}) be a decision problem.
f is recursive ( decidable) iff there exists a TM M which decides f .
The set of recursive functions is
R = {f Hom(N, {0, 1}) | f is recursive }
f is recursively enumerable ( semi-decidable) iff there exists a TM
M which accepts f . The set of recursive-enumerable functions is
RE = {f Hom(N, {0, 1}) | f is recursively-enumerable }
Now, let us turn our attention to question (?), which we shall formulate
as a problem:

1 iff M (w) halts
enc(M ) w
fh (n
)=
0 iff M (w) =
Hence, the input of f is a natural number which encodes a Turing Machine M and an input word w. The first question we ask, is whether fh R.
Proposition 2.3.3 fh 6 R.
Proof: Assume fh R and denote by Mh the Turing Machine which decides
fh . We build the Turing Machine D, as follows:

iff Mh (enc(M ) enc(M )) = 1
D(enc(M )) =
1
iff Mh (enc(M ) enc(M )) = 0
The existence of the Universal Turing Machine guarantees that D can
indeed be built, since D simulates Mh . We note that Mh (enc(M ) enc(M ))
decides if the TM M halts with itself as input (namely enc(M )).
Assume D(enc(D)) = 1. Hence Mh (enc(D), enc(D)) = 0, that is, machine D does not halt for input enc(D). Hence D(enc(D)) = . Contradiction.
17

Assume D(enc(D)) = . Hence Mh (enc(D), enc(D)) = 1, and thus


D(enc(D)) halts. Contradiction.

We note that the construction of D mimics the technique which we
applied for the proof of Proposition 2.3.2, which is called diagonalization
Exercise 2.3.1 Apply the diagonalization technique from the proof of Proposition 2.3.2, in order to prove Proposition 2.3.3.
Proposition 2.3.4 fh RE.
Proof: We build a Turing Machine Mh which accepts fh . Essentially, Mh is
the Universal Turing Machine. Mh (enc(M ) w) simulates M , and if M (w)

halts, then it outputs 1. If M (w) does not halt, Mh (enc(M ) w) = .
Propositions 2.3.3 and 2.3.4 produce a classification for fh . The question
which we shall answer, is how to classify any problem f , by establishing
membership in R and RE, respectively. We start with a simple proposition:
Proposition 2.3.5 R ( RE.
Proof: R RE is straightforward from Definition 2.3.2. Let f R, and Mf
be the TM which decides Mf . We build the TM M 0 such that M 0 (w) = 1
iff Mf (w) = 1 and M 0 (w) = iff Mf (w) = 0. M 0 simulates M but enters
into an infinite loop whenever Mf (w) = 0. M 0 accepts f hence f RE.
R 6= RE has already been shown by Propositions 2.3.3 and 2.3.4. fh
RE but fh 6 R.

Thus, R and RE should be interpreted as a scale for solvability: R
membership is complete solvability, RE membership is partial solvability,
while non-membership in RE is complete unsolvability.
Remark 2.3.1 We note that R and RE are not the only sets of functions
which are used in Computability Theory. It has been shown that there are
degrees of unsolvability, of higher level than R and RE. These degrees
are intuitively obtained as follows: We assume we live in a world where fh
is decidable (recursive). Now, as before, we ask which problems are recursive
and which are recursively-enumerable. It turns out that, also in this ideal
case, there still exist recursive and recursively-enumerable problems, as well
as some which are neither. This could be imagined as undecidability level
1. Now, we take some problem which is in RE on level 1, and repeat the
same assumption: it is decidable. Again, under this assumption, we find
problems in R, RE and outside the two, which form up undecidability
level 2. This process can be repeated ad infinitum.
18

Returning to our simpler classification, we must observe an interesting


feature of recursively-enumerable functions, which is also the reason they
are called this way.
Proposition 2.3.6 A function f Hom(N, {0, 1}) is recursively enumerable iff there exists a Turing Machine which can enumerate/generate all
elements in Af = {w N | f (nw ) = 1}. Intuitively, Af is the set of inputs
of f for which the answer at hand is yes.
Proof: = Suppose f is recursively-enumerable and M accepts f . We
write wi to refer to the ith word from . We specify the TM generating
Af by the following pseudocode:
Algorithm 1: GEN()
static Af = ;
k = 0;
while True do
for 0 i k do
run M (wi );
if M (wi ) halts before k steps and i 6 Af then
Af = Af {wi };
return wi
end
end
k = k + 1;
end
The value of k from the for has a two-fold usage. First, it is used to
explore all inputs wi : 0 i k. Second, it is used as a time-limit for M .
For each wi we run M (wi ), for precisely k steps. If M (wi ) = 1 in at most
k steps, then wi is added to Af , and then returned (written on the tape).
Also, wi is stored for a future execution of GEN. If M (wi ) = 1 for some wi ,
then there must exist a k : k i such that M (wi ) halts after k steps. Thus,
such a k will eventually be reached.
= Assume we have the Turing Machine GEN which generates Af .

19

We construct a Turing Machine M which accepts f . M works as follows:


Algorithm 2: M (w)
Af = , n = 0;
while w 6 Af do
w = GEN ();
Af = Af {w}
end
return 1
M simply uses GEN to generate elements in Af . If w Af , if will
eventually be generated, and M will output 1. Otherwise M will loop.
Thus M accepts f .

Proposition 2.3.6 is useful since, in many cases, it is easier to find a
generator for f , instead of a Turing Machine which accepts f .
Finally, in what follows, we shall take a few decision problems, and apply
a reduction technique, in order to prove they are not decidable.
Halting on all inputs

Let:

1 iff M halts for all inputs
fall (nenc(M ) ) =
0 otherwise

The technique we use to show fall 6 R is called a reduction (from fh ). It


proceeds as follows. We assume fall R. Starting from the TM Mall which
decides fall we built a TM which decides fh . Thus, if fall is decidable then
fh is decidable, which leads to contradiction.
First, for each fixed TM M and fixed input w , we build the TM
M,w () = Replace by w and then simulate M (w). It is easy to see
that ( : M,w () halts) iff M (w) halts. Now, we build the TM Mh
which decides fh . The input of Mh is enc(M ) w. We construct M,w and
run Mall (enc(M,w )). By assumption Ma ll must always halt. If the output
is 1, then M,w () halts for all inputs, hence M (w) halts. We output 1. If
the output is 0, then M,w () does not halt for all inputs, hence M (w) does
not halt. We output 0.
We have built a reduction from fall to fh : Using the TM which decides fall we have constructed a machine which decides fh . Since fh is not
recursive, we obtain a contradiction.
Definition 2.3.3 (Turing-reducibility) Let fA , fB Hom(N, {0, 1}). We
say fA is Turing reducible to fB , and write fA T fB iff there exists a decidable transformation T Hom(N, N) such that fA (n) = 1 iff fB (T (n)) = 1.
20

Remark 2.3.2 (Reducibility) We note that the transformation T must


be decidable.When proving fhalt T fall , we have taken nenc(M ) w an
instance of fh and shown that it could be transformed into nenc(M,w )
an instance of fall , such that fh (nenc(M ) w ) = 1 iff fall (nenc(M,w ) ) = 1.
A Turing Machine can easily perform the transformation of enc(M ) w to
enc(M,w ) since it involves adding some states and transitions which precede
the start-state of M , hence T is decidable.
Halting on 111
f111 (n

enc(M )


)=

1 iff M (111) halts


0 otherwise

We reduce fh to f111 . Assume M111 decides f111 . Given a Turing Machine M and word w, we construct the machine:
M,w () = if = 111 then M (w) else loop
We observe that (i) the transformation from enc(M ) w to enc(M,w ) is
decidable since it involves adding precisely three states to M : these states
check the input , and if it is 111 replace it with w and run M ; (ii)
M111 (M,w ) = 1 iff M (w) halts. The reduction is complete. f111 6 R.
Halting on some input We define:

1 iff M (w) halts for some w
enc(M )
fany (n
)=
0 otherwise
We reduce f111 to fany . We assume fany is decided by Many . We construct:
M () = Replace by 111 and M (111)
Now, Many (enc(M )) = 1 iff M (111) = 1, hence we can use Many to build
a machine which decides f111 . Contradiction fany 6 R.
Machine halt equivalence We define:

1 for all w : M1 (w) halts iff M2 (w) halts
enc(M1 ) enc(M2 )
feq (n
)=
0 otherwise
We reduce fall to feq . Let Mtriv be a one-state Turing Machine which
halts on every input, and Meq be the Turing Machine which decides feq .
21

Then Meq (enc(M ) enc(Mtriv ) = 1 iff M halts on all inputs. We have shown
we can use Meq in order to build a machine which decides fall . Contradiction.
feq 6 R.
So far, we have used reductions in order to establish problem nonmembership in R. There are other properties of R and RE which can be of
use for this task. First we define:
Definition 2.3.4 (Complement of a problem) Let f Hom(N, {0, 1}).
We denote by f the problem:

1 iff f (n) = 0
f (n) =
0 iff f (n) = 1
We call f the complement of f .
For instance, the complement of fh is the problem which asks if a Turing
Machine M does not halt for input w. We also note that f = f .
Next, we define the class:
coRE = {f Hom(N, {0, 1}) | f RE}
coRE contains the set of all problems whose complement is in RE.
We establish that:
Proposition 2.3.7 RE coRE = R.
Proof: Assume f RE coRE. Hence, there exists a Turing Machine M
which accepts f and a Turing Machine M which accepts f . We build the
Turing Machine:
M (w) = for i N :

run M (w) for i steps.


If M (w) = 1, return 1. Otherwise:
run M (w) for i steps.
If M (w) = 1, return 0.

Since M and M will always halt when expected result is 1, they can be
used together to decide f . Hence f R.

Proposition 2.3.8 f R iff f R.
Proof: The proposition follows immediately since the Turing Machine which
decides f can be used to decide f , by simply switching its output from 0 to
1 and 1 to 0. The same holds for the other direction.

We conclude this chapter with a very powerful result which states that
an category/type of problems does not belong in R.
22

Theorem 2.3.1 (Rice) Let C RE. Given a Turing Machine M , we ask:


The problem accepted by M is in C?. Answering this question is not in R.
Proof: We consider that the trivial problem f (n) = 0 is not in C. Since C is
non-empty, suppose f C, and since f is recursively-enumerable, let M
be the Turing Machine which accepts f .
We apply a reduction from a variant of f111 , namely fx . fx asks if a
Turing Machine halts for input x. We assume we can decide the membership
f C by some Turing Machine. Based on the latter, we construct a Turing
Machine which decides fx (i.e. solves the halting problem for a particular
input). Let Mx be the Turing Machine which accepts fx .
Let:
w () = if Mx (w) then M () else loop.
If fw is the problem accepted by w , we show that:
fw C iff Mx (w) halts
(). Suppose fw C. Then w () cannot loop for every input . If
there were so, then fw would be the trivial function always returning 0 for
any input, which we have assumed is not in C. Thus, Mx (w) halts.
(). Suppose Mx (w) halts. Then the behaviour of w () is precisely that of
M (). w () will return 1 whenever M () will return 1 and w () =
whenever M () = . Since f C, then also fw C.

In Theorem 2.3.1, the set C should be interpreted as a property of
problems, and subsequently of the Turing Machines which accept them.
Checking if some Turing Machine M satisfies the given property is undecidable. Consider the property informally described as: The set of Turing
Machines(/computer programs) that behave as viruses. The ability of deciding whether a Turing Machine behaves as a virus (i.e. belongs to the former
set) is undecidable, via Rices Theorem.

3
3.1

Complexity Theory
Measuring time and space

In Computability Theory, we have classified problems (e.g. in classes R and


RE) based on Turing Machines ability to decide/accept them.
In order to classify problems based on hardness, we need to account for
the number of steps (time) and tape cells which are employed by a Turing
Machine.
23

The amount of spent resources (time/space) by a Turing Machine M


may be expressed as functions:
TM , SM : N
where TM (w) (resp. SM (w)) is the number of steps performed (resp. tape
cells used) by M , when running on input w.
This definition suffers from un-necessary overhead, which makes time
and space analysis difficult. We formulate some examples to illustrate why
this is the case:
Alg(n)
while n < 100
n=n+1
return 1
We note that Alg runs 100 steps for n = 0 while only one step, for
n 100. However, in practice, it is often considered that each input is as
likely to occur as any other6 . For this reason, we shall adopt the following
convention: (?) We allways consider the most expensive/ defavourable case,
given inputs of a certain type. In our previous example, we consider the
running time of Alg as being 100, since this is the most expensive case.
Consider the following example:
Sum(v, n)
s = 0, i = 0
while i < n
s = s + v[i]
i=i+1
return s
Unlike Alg, Sum does not have a universal upper limit on its running
time. The number of steps Sum executes, depends on the number of variables from v, namely n, and is equal to 2n + 3, if we consider each variable
6
This is not often the case. There are numerous algorithms which rely on some probability that the input is of some particular type. For instance, efficient SAT solvers rely
on a particular ordering of variables, when interpretations are generated and verified. On
certain orderings and certain formulae, the algorithm runs in polynomial time. The key
to the efficiency of SAT solvers is that programmers estimate and efficient ordering, based
on some expectancy from the input formula. The algorithm may be exponentially costly
for some formulae, but run in close-to-polynomial time, for most of the inputs.

24

initialisation and the return statements, as computing steps. Thus, we observe that (??) The running time (resp. consumed space) of an algorithm
will grow as the size of the input grows.
We can now merge (?) and (??) into a definition:
Definition 3.1.1 (Running time of a TM) The running time of a Turing Machine M is given by TM : N N iff:
w : the nb. of transitions performed by M is at most TM (|w|)
Remark 3.1.1 (Consumed space for a TM) A naive definition for consumed space of a Turing Machine would state that SM (|w|) is the number
of tape cells which M employs. This definition is imprecise. Consider the
Turing Machine which receives a binary word as input and compute whether
the word is a power of 2n . Asside from reading its input, the machine consumes no space. Thus, we might refine our definition into: SM (|w|) is the
number of tape writes which M performs. This definition is also imprecise.
Consider the binary counter Turing Machine from the former chapter. It
performs a number of writes proportional to the number of consecutive 1s
found at the end of the string. However, the counter does not use additional
space, but only makes processing on the input.
Thus, the consumed space SM (|w|) is the number of written cells except
from the input. Consider a Turing Machine which receives n numbers
encoded as binary words, each having at most 4 bits, and which computes
the sum of the numbers, modulo 24 . Apart from reading the 4n bits and n1
word separators, the machine employs another 4 cells to hold a temporary
sum. Thus, the consumed space for this machine is 47 .
A formal definition for consumed space of a TM is outside the scope of
this course, since it involves multi-tape Turing Machines. The basic idea is
to separate the input from the rest of the space used for computation.
Thus, when assesing the consumed space of an algorithm, we shall never
account for the consumed space by the input.
Recall that, as in Computability Theory, our primary agenda is to produce a classification of problems. To this end, it makes sense to first introduce a classification of Turing Machine running times.
7

We can also build another machine which simply uses the first number to hold the
temporary sum, and this use no additional space.

25

3.1.1

Asymptotic notations

Remark 3.1.2 (Running times vs. arbitrary functions) In the previous section, we have defined running times of a Turing Machine as functions:
T : N N, and we have seen that they are often monotonically increasing
(n m = T (n) T (m)). While monotonicity is common among the
running times of conventional algorithms, it is not hard to find examples
(more-or-less realistic) where it does not hold. For instance, an algorithm
may simply return, if its input exceeds a given size. Thus, we shall not, in
general, assume that running times are monotonic.
Furthermore, we shall extend our classification to arbitrary functions
f : R R, since there is no technical reason to consider only functions over
naturals. In support for this, we shall also add that asymptotic notations are
useful in other fields outside complexity theory, where the assumption that
functions are defined over natural numbers only is not justified.
Definition 3.1.2 ( (theta) notation) Let g : R R. Then (g(n)) is
the class of functions:
(g(n)) = {f : R R |

c1 , c2 R+
n n0 , c1 g(n) f (n) c2 g(n)}
n0 N

Thus, (f (n)) is the class of all functions with the same asymptotic
growth as f (n). We can easily observe that, for all continuous g, f
(n)
Hom(R, R) such that g (f (n)), we have limn fg(n)
= c, where c 6= 0.
There is an infinite number of classes (f (n)), one for each function f .
However, if g(n) (f (n)), then (g(n)) = (f (n)).
It makes sense to consider classes which describe functions with inferior /superior asymptotic growth:
Definition 3.1.3 (O, notations) Let f : R R. Then:
O(g(n)) = {f : R R |

c R+
n n0 , 0 f (n) c g(n)}
n0 N

(g(n)) = {f : R R |

c R+
n n0 , 0 c g(n) f (n)}
n0 N

Note that g O(f (n)) = O(g(n)) O(f (n)), while g (f (n)) =


(g(n)) (g(n)). Finally, (f (n)) O(f (n)) = (f (n)). Each of the
above propositions can be easily proved using the respective definitions of
the notations.
26

O and offer relaxed bounds for asymptotic function growth. Thus,


g O(f (n)) should be read as:The function g grows asymptotically at most
as much as f . It makes sense to also consider tight bounds:
Definition 3.1.4 (o, notations)
o(g(n)) = {f : R R |

c R+
n n0 , 0 f (n) c g(n)}
n0 N

(g(n)) = {f : R R |

c R+
n n0 , 0 c g(n) f (n)}
n0 N

Thus, g o(f (n)) should be read: g grows assymptotically strictly less


than f . We have o(f (n)) (f (n)) = O(f (n)) (f (n)) = (f (n)) and
(f (n)) (f (n)) = (f (n)).
Exercise 3.1.1
If f (n) (n2 ) and g(n) O(n3 ) then f (n)/g(n) . . .
If f (n) o(n2 ) and g(n) (n3 ) then f (n) g(n) . . .
If f (n) (n3 ) and g(n) o(n2 ) then f (n) \ g(n) . . .
Exercise 3.1.2 Prove or disprove the following implications:
f (n) = O(log n) 2f (n) = O(n)
3
f (n) = O(n2 ) and g(n) = O(n)
p f (g(n)) = O(n )
f (n) = O(n) and g(n) = 1 + f (n) g(n) = (log n)

Syntactic sugars

This section follows closely Lecture 2 from [1]. Quite often, asymptotic
notations are used to refer to arbitrary functions having certain properties
related to their order of growth. For instance, in:
df (x)e = f (x) + O(1)
applying rounding to f (x), may be expressed as the original f (x) to which
we add a function bounded by a constant. Similarly:
1
= 1 + x + x2 + x3 + (x4 ), for 1 < x < 1
1x
27

The above notation allows us to formally disregard the terms from the
expansion, by replacing them with an asymptotic notation which characterises their order of growth. One should make a distinction between the
usage of asymptotic notations in arithmetic expressions, such as the ones
previously illustrated, and equations. Consider the following example:
f (x) = O(1/x)
which should be read: there exists a function h O(1/x) such that f (x) =
h(x). Similarly:
f (x) = O(log x) + O(1/x)
should be read: there exist functions h O(1/x) and w O(log x) such that
f (x) = w(x) + h(x). In equations such as:
O(x) = O(log x) + O(1/x)
the equality is not symmetric, and should be read from left to right: for
any function f O(x), there exist functions h O(1/x) and w O(log x)
such that f (x) = w(x) + h(x). In order to avoid mistakes, the following
algorithmic rule should be applied. When reading an equation of the form:
lef t = right
each occurrence of an asymptotic notation in left should be replaced by
an universally quantified function belonging to the corresponding
class.
each occurrence of an asymptotic notation in right should be replaced
by an existentially quantified function from the corresponding class.
Consumed space in the TM, vs consumed space on algorithms....
Computability: what Complexity: how much

4.1

Measuring time and space

Course Asymptotic notations. Basic assumptions from computability


theory (constants dont matter, degrees of polynomials dont matter).
Lab

(Recurrences shift to labs).

4.2

Properties of Turing Machines

The universal TM, and irrellevance of the alphabet choice.


28

4.3

Complexity classes

4.4

SAT. The first NP-complete problem.

4.5

Hard and complete problems. Reductions

4.6

Many examples of the former kind

Correctness

5.1

ADTs and structural induction

5.2

Loop invariants

5.3

Model checking

References
[1] A.J. Hildebrand.
Asymptotic methods in analysis

http://www.math.uiuc.edu/hildebr/595ama/.
Math595AMA, 2009.

29

You might also like