Professional Documents
Culture Documents
Matei Popovici
October 24, 2014
Introduction
1.1
1.1.1
On the 11th of April 1970, the Apollo 13 mission is taking off, with the
intention of landing on the Moon. Two days into the mission, an explosion
in the Service Module forces the crew to abort mission and attempt a return
to Earth. Facing extreme hardship, especially due to loss of power, the crew
finally succeeds in returning to Earth. Both the crew and the support team
from Space Center were forced to find fast and simple solutions to apparently
insurmontable problems1 .
Drawing inspiration from this scenario, we consider the following inspirational example.
The space mission S takes off having Moon as destination. Some time
after take-off, S notices a malfunction in the main engine, which fails to
respond to command inputs. Trying to hack-control the engine the
following circuit is identified:
(A B D) (B C D) (A C D) (A BD)
(B C D) (A CD) (A B C) (A B C)
If can be made to output 1, then the engine could be manually started. S
however notices that, with no apparent input for A, B, C and D, does the
circuit yield 1. S requires advice from Control, as how to proceed in this
situation.
1
One example is the famous improvisation which made the Command Modules square
CO2 filters operable in the Lunar Module, which required such filters of round shape.
This is highly similar to the real situation of Apollo 13: given their position and
malfunctions made a direct abort and return to Earth impossible. Instead, their return
trajectory involved entering Moon orbit.
c) return 0
We now have a procedure for solving S, one which Control may use in
order to assist the Mission. However, several issues arise:
(V) is FIND() correct, that is, bug-free, and returning an appropriate
result ?
(M) how long does FIND() take to run, with respect to the size of the
input (i.e. the size of the matrix ) ?
(C) is FIND() an optimal algorithm? Are there ones which perform better?
Next, we discuss each issue in more detail: (V) denotes the Verification
problem. Verification of computer programs (that is, of algorithm implementations) is a essential nowadays, as more and more we rely on software to
perform critical tasks: assist us in driving cars and flying planes, guiding
missiles and performing surgeries. Verification is equally important for the
mission.
(M) identifies a measurement problem. Namely, in what way Critical
resources are consumed by an algorithm. Traditionally, these resources
are time, i.e. number of CPU operations and space i.e. computer memory.
We note that, for our given formula, CHECK(, I) will run at most 84 = 32
CPU operations, where 8 is the number of clauses, and 4 is the number of
variables from any input. FIND() will build at most 2n assignments I,
where n is the number of variables. In our example, n = 4, thus we have 16
different ways of assigning A, B, C and D to values 0, 1. For each, we run
CHECK(, I), yielding a total of 16 32 = 512 CPU operations.
For the sake of our example, let us assume Missions computers ran
one CPU instruction per second, which is not unlikely, given the hardware
available in the 70s. Thus, in 5 minutes, we can perform: 5 60 = 300 CPU
operations ! Note that FIND() does not finish in sufficient time: if Control
would use this algorithm, it would waste Missions time.
Finally, (C) designates a complexity problem, one specific to Complexity Theory: Do efficient algorithms exist, for a given problem? .
There are also sub-questions which spawn from the former:
is there a better encoding of the input and the variables from I, which
makes the algorithm run faster?
is there a certain machine which allows performing computations in
a more efficient (fast) way?
4
Disclamer
1.2
Which is harder?
Computability Theory
2.1
It is often the case that the answers we seek are also mathematical objects. For instance, the vector sorting problem must be answered by a sorted
vector. However, many other problems prompt yes/no answers. Whenever
O = {0, 1} we say that P is a decision problem. Many other problems can
be cast into decision problems. The vector sorting problem may be seen
as a decision problem, if we simply ask whether the problem instance (i.e.
the vector) is sorted. The original sorting problem and its decision counterpart may not seem equivalent in terms of hardness. For instance, sorting
is solved in polynomial time n log n (using standard algorithms) which
deciding whether a vector is sorted can be done in linear time. We shall
see that, from the point of view of Complexity Theory, both problems are
equally hard.
Definition 2.1.1 may seem abstract and unusable for the following reason:
the set I is hard to characterize. One solution may be to assign types to
problem instances. For example graph may be a problem instance type.
However, such a choice forces us to reason about problems separately, based
on the type of their problem instances. Also types themselves are an infinite
set, which is also difficult to characterize.
Another approach is to level out problem instances, starting from the
following key observations: (i) each i I must be, in some sense finite.
For instance, vectors have a finite length, hence a finite number of elements.
Graphs (of which we ask our questions) also have a finite set of nodes, hence,
a finite set of edges, etc. (ii) I must be countable (but not necessarily finite).
For instance, the problem P : RR {0, 1} where P (x, y) returns 1 if x and
y are equal, has no sense fromthe point of computability
theory. Assume
we would like to answer P (, 2). Simply storing and 2, which takes
infinite space, is impossible on machines, and also takes us to point (i).
The observations suggest that problem instances can be represented via
a finite encoding, which may be assumed to be uniform over all possible
mathematical objects we consider.
Definition 2.1.2 (Encoding problem instances) Let be a finite set
whom we call alphabet. A one-letter word is a member of . A two-letter
word is any member of = 2 . For instance, if = {a, b . . .}, then
(a, a) 2 is a two-letter word. An i-letter word is a member of i . We
denote by:
= {} 2 . . . i . . .
the set of finite words which can be formed over . is a special word which
we call empty word. Instead of writing, e.g. (a, b, b, a, a) for a 5-letter word,
we simply write abbaa. Concatenation of two words is defined as usual.
8
2.2
Algorithms are usually described as pseudo-code, and intended as abstractions over concrete programming language operations. The level of abstraction is usually unspecified rigorously, and is decided in an ad-hoc manner by
the writer. From the authors experience, pseudo-code is often dependent
on (some future) implementation, and only abstracts from language syntax, possibly including data initialization and subsequent handling. Thus,
some pseudo-code can be easily implemented in different languages to the
extent to which the languages are the same, or at least follow the same
programming principles.
The above observation is not intended as a criticism towards pseudocode and pseudo-code writing. It is indeed difficult, for instance, to write
pseudocode which does not seem vague, and which can be naturally implemented in an imperative language (using assignments and iterations) as
well as in a purely functional language (where iterations are possible only
through recursion).
As before, we require a means for leveling out different programming
styles and programming languages, in order to come up with a uniform,
straightforward and simple definition for an algorithm.
The key observation here is that programming languages, especially the
newest and most popular, are quite restrictive w.r.t. to what the programmer can do. This may seem counter-intuitive at first. Consider typed languages for instance. Enforcing each variable to have a type is obviously
a restriction, and has a definite purpose: it helps the programmer write
cleaner code, and one which is less likely to crash at runtime. However, this
issue is irrelevant from the point of view of Computability Theory. If we try
to search for less restrictive languages, we find the assembly languages. The
restrictions are minimal here (as well as the programming structure).
The formal definition for an algorithm which we propose, can be seen
as an abstract assembly language, where all technical aspects are put aside.
We call such a language the Turing Machine.
Definition 2.2.1 (Deterministic Turing Machine) A Deterministic Turing Machine (abbreviated DTM) is a tuple M = (K, F, , , s0 ) where:
= {a, b, c, . . .} is a finite set of symbols which we call alphabet;
K is a set of states, and F K is a set of accepting/final states;
: K K {L, H, R} is a transition function which assigns
to each state s K and c the triple (s, c) = (s0 , c0 , pos).
10
s0 K is an initial state.
The Turing Machine has a tape which contains infinite cells in both directions, and on each tape cell we have a symbol from . The Turing Machine
has a tape head, which is able to read the symbol from the current cell. Also,
the Turing Machine is always in a given state. Initially (before the machine
has started) the state is s0 . From a given state s, the Turing Machine reads
the symbol c from the current cell, and performs a transition. The transition
is given by (s, c) = (s0 , c0 , pos). Performing the transition means that the
TM moves from state s to s0 , overrides the symbol c with c0 on the tape cell
and: (i) if pos = L moves the tape head on the next cell to the left, (ii) if
pos = R moves the tape head on the next cell to the right and (iii) pos = H
leaves tape head on the current cell.
The Turing Machine will perform transitions according to .
Whenever the TM reaches an accepting/final state, we say it halts. If
a TM reaches a non-accepting state where no other transitions are possible,
we say it clings/hangs.
the input of a Turing Machine is a finite word which is contained in
its otherwise empty tape;
the output of a TM is the contents of the tape (not including empty
cells) after the Machine has halted. We also write M (w) to refer to
the output of M , given input w.
Example 2.2.1 (Turing Machine) Consider the alphabet = {#, >, 0, 1},
the set of states K = {s0 , s1 , s2 }, the set of final states F = {s2 } and the
transition function:
(s0 , 0) = (s0 , 0, R) (s0 , 1) = (s0 , 1, R)
(s0 , #) = (s1 , #, L) (s1 , 1) = (s1 , 0, L)
(s1 , 0) = (s2 , 1, H) (s1 , >) = (s2 , 1, H)
The Turing Machine M = (K, F, , , s0 ) reads a number encoded in binary
on the tape, and increments it by 1. The symbol # encodes the empty cell
tape.4 Initially, the tape head is positioned at the most significant bit of the
number. The Machine first goes over all bits, from left to right. When the
first empty cell is detected, the machine goes into state s1 , and starts flipping
1s to 0s, until the first 0 (or the initial position, marked by >) is detected.
Finally, the machine places 1 on this current cell, and enters its final state.
4
11
start
s0
1/0, L
#/#, L
s1
0/1, H
> /1, H
s2
12
The answer is, again, the Turing Machine. This implies that, a Turing Machine acting as a programming language, can be fed another Turing Machine
acting as program, and execute it.
In the following Proposition, we show how Turing Machines can be encoded as words:
Proposition 2.2.1 (TMs as words) Any Turing Machine M = (K, F, ,
, s0 ) can be encoded as a word over . We write enc(M ) to refer to this
word.
Proof:(sketch) Intuitively, we encode states and positions as integers n N,
transitions as pairs of integers, etc. and subsequently convert each integer
to its word counterpart in , cf. Proposition 2.1.1.
Let NonFin = |K \ F \ {s0 }| be the set of non-final states, excluding the initial one. We encode each state in NonFin as an integer in
{1, 2, . . . , |NonFin|} and each final state as an integer in
{|NonFin|+1, . . . , |NonFin|+|F |}. We encode the initial state s0 as |NonFin|+
|F | + 1, and L, H, R as |NonFin| + |F | + i with i {2, 3, 4}. Each integer
from the above is represented as a word using dlog|| (|NonFin| + |F | + 4)e
bits.
Each transition (s, c) = (s0 , c0 , pos) is encoded as:
enc(s)#c#enc(s0 )#c#enc(pos)
where enc() is the encoding described above. The entire is encoded a
sequence of encoded transitions, separated by #. The encoding of M is
enc(M ) = enc(|NonFin|)#enc(|F |)#enc()
Thus, enc(M ) is a word, which can be fed to another Turing Machine.
The latter should have the ability to execute (or to simulate) M . This is
indeed possible:
Proposition 2.2.2 (The Universal Turing Machine) There exists a TM
U which, for any TM M , and every word w , takes enc(M ) and w as
input and outputs 1 whenever M (w) = 1 and 0 whenever M (w) = 0. We
call U , the Universal Turing Machine, and say that U simulates M .
Proof: Let M be a TM and w = c1 c2 . . . cn be a word which is built from
the alphabet of M . We build the Universal Turing Machine U , as follows:
13
To be fair to the TM, one would formulate this statement as: all programming
languages are Turing-complete, i.e. they can solve everything the TM can solve
14
based on side-effects: each transition may modify the tape in some way.
Computation can be described differently, for instance: as function application, or as term rewriting. However, all other known computational models
are equivalent to the Turing Machine, in the sense that they solve precisely
the same problems.
This observation prompted the aforementioned conjecture. It is strongly
believed to hold (as evidence suggests), but it cannot be formally proved.
2.3
Decidability
The existence of the Universal Turing Machine (U) inevitably leads to interesting questions. Assume M is a Turing Machine and w is a word. We use
the following convention: enc(M ) w in order to represent the input of the
U . Thus, U expects the encoding of a TM, followed by the special symbol ,
and M s input w. (?) Does U halt for all inputs? If the answer is positive,
then U can be used to tell whether any machine halts, for a given input.
We already have some reasons to believe we cannot answer positively to
(?), if we examine the proof of Proposition 2.2.2. Actually (?) is a decision
problem, one that is quite interesting and useful.
As before, we try to lift our setting to a more general one: Can any
problem be solved by some Turing Machine. The following propositions
indicate that its not likely the case:
Proposition 2.3.1 The set T M of Turing Machines is countably infinite.
Proof: The proof follows immediately from Proposition 2.2.1. Any Turing
Machine can be uniquely encoded by a string, hence the set of Turing Machines is isomorphic to a subset of , which in turn is countably infinite,
since is countably infinite, for any .
Proposition 2.3.2 The set Hom(N, N) of functions f : N N is uncountably infinite.
Proof: It is sufficient to show that Hom(N, {0, 1}) is uncountably infinite.
We build a proof by contraposition. We assume Hom(N, {0, 1}) is countably
infinite. Hence, each natural number n N corresponds to a function fn
Hom(N, {0, 1}). We build a matrix as follows: Columns describe functions
fn : n N. Rows describe inputs k N. Each matrix content mi,j is the
value of fj (i) (hence, the expected output for input i, from function fj ).
15
f0 f1 f2
0
1
1
0
1
0
1
1
2
1
0
1
... ... ... ...
n
1
1
0
... ... ... ...
. . . fn . . .
... 0 ...
... 0 ...
... 1 ...
... ... ...
... 1 ...
... ... ...
f (x) =
0 iff fx (x) = 1
Since f Hom(N, {0, 1}) it must also have a number assigned to it: f =
f for some N. Then f () = 1 if f () = 0. But f () = f ().
Contradiction. On the other hand f () = 0 if f () = 1. As before we
obtain a contradiction.
Propositions 2.3.2 and 2.3.1 tell us that there are infinitely more functions (decision problems) what means of computing them (Turing Machines).
Our next step is to look at solvable and unsolvable problems, and devise
a method for separating the first from the latter. In other words, we are
looking for a tool which allows us to identify those problems which are
solvable, and those which are not.
We start by observing that Turing Machines may never halt. We write
M (w) = to designate that M loops infinitely for input w. Also, we
write nw N to refer to the number which corresponds to w, according to
Proposition 2.1.1. Next, we refine the notion of problem solving:
Definition 2.3.1 (Decision, acceptance) Let M be a Turing Machine
and f Hom(N, {0, 1}).We say that:
M decides f , iff for all n N: M (w) = 1 whenever f (nw ) = 1 and
M (w) = 0 whenever f (nw ) = 0.
M accepts f iff for all n N: M (w) = 1 iff f (nw ) = 1, and M (w) =
iff f (n) = 0.
16
19
Let:
1 iff M halts for all inputs
fall (nenc(M ) ) =
0 otherwise
enc(M )
)=
We reduce fh to f111 . Assume M111 decides f111 . Given a Turing Machine M and word w, we construct the machine:
M,w () = if = 111 then M (w) else loop
We observe that (i) the transformation from enc(M ) w to enc(M,w ) is
decidable since it involves adding precisely three states to M : these states
check the input , and if it is 111 replace it with w and run M ; (ii)
M111 (M,w ) = 1 iff M (w) halts. The reduction is complete. f111 6 R.
Halting on some input We define:
1 iff M (w) halts for some w
enc(M )
fany (n
)=
0 otherwise
We reduce f111 to fany . We assume fany is decided by Many . We construct:
M () = Replace by 111 and M (111)
Now, Many (enc(M )) = 1 iff M (111) = 1, hence we can use Many to build
a machine which decides f111 . Contradiction fany 6 R.
Machine halt equivalence We define:
1 for all w : M1 (w) halts iff M2 (w) halts
enc(M1 ) enc(M2 )
feq (n
)=
0 otherwise
We reduce fall to feq . Let Mtriv be a one-state Turing Machine which
halts on every input, and Meq be the Turing Machine which decides feq .
21
Then Meq (enc(M ) enc(Mtriv ) = 1 iff M halts on all inputs. We have shown
we can use Meq in order to build a machine which decides fall . Contradiction.
feq 6 R.
So far, we have used reductions in order to establish problem nonmembership in R. There are other properties of R and RE which can be of
use for this task. First we define:
Definition 2.3.4 (Complement of a problem) Let f Hom(N, {0, 1}).
We denote by f the problem:
1 iff f (n) = 0
f (n) =
0 iff f (n) = 1
We call f the complement of f .
For instance, the complement of fh is the problem which asks if a Turing
Machine M does not halt for input w. We also note that f = f .
Next, we define the class:
coRE = {f Hom(N, {0, 1}) | f RE}
coRE contains the set of all problems whose complement is in RE.
We establish that:
Proposition 2.3.7 RE coRE = R.
Proof: Assume f RE coRE. Hence, there exists a Turing Machine M
which accepts f and a Turing Machine M which accepts f . We build the
Turing Machine:
M (w) = for i N :
Since M and M will always halt when expected result is 1, they can be
used together to decide f . Hence f R.
Proposition 2.3.8 f R iff f R.
Proof: The proposition follows immediately since the Turing Machine which
decides f can be used to decide f , by simply switching its output from 0 to
1 and 1 to 0. The same holds for the other direction.
We conclude this chapter with a very powerful result which states that
an category/type of problems does not belong in R.
22
3
3.1
Complexity Theory
Measuring time and space
24
initialisation and the return statements, as computing steps. Thus, we observe that (??) The running time (resp. consumed space) of an algorithm
will grow as the size of the input grows.
We can now merge (?) and (??) into a definition:
Definition 3.1.1 (Running time of a TM) The running time of a Turing Machine M is given by TM : N N iff:
w : the nb. of transitions performed by M is at most TM (|w|)
Remark 3.1.1 (Consumed space for a TM) A naive definition for consumed space of a Turing Machine would state that SM (|w|) is the number
of tape cells which M employs. This definition is imprecise. Consider the
Turing Machine which receives a binary word as input and compute whether
the word is a power of 2n . Asside from reading its input, the machine consumes no space. Thus, we might refine our definition into: SM (|w|) is the
number of tape writes which M performs. This definition is also imprecise.
Consider the binary counter Turing Machine from the former chapter. It
performs a number of writes proportional to the number of consecutive 1s
found at the end of the string. However, the counter does not use additional
space, but only makes processing on the input.
Thus, the consumed space SM (|w|) is the number of written cells except
from the input. Consider a Turing Machine which receives n numbers
encoded as binary words, each having at most 4 bits, and which computes
the sum of the numbers, modulo 24 . Apart from reading the 4n bits and n1
word separators, the machine employs another 4 cells to hold a temporary
sum. Thus, the consumed space for this machine is 47 .
A formal definition for consumed space of a TM is outside the scope of
this course, since it involves multi-tape Turing Machines. The basic idea is
to separate the input from the rest of the space used for computation.
Thus, when assesing the consumed space of an algorithm, we shall never
account for the consumed space by the input.
Recall that, as in Computability Theory, our primary agenda is to produce a classification of problems. To this end, it makes sense to first introduce a classification of Turing Machine running times.
7
We can also build another machine which simply uses the first number to hold the
temporary sum, and this use no additional space.
25
3.1.1
Asymptotic notations
Remark 3.1.2 (Running times vs. arbitrary functions) In the previous section, we have defined running times of a Turing Machine as functions:
T : N N, and we have seen that they are often monotonically increasing
(n m = T (n) T (m)). While monotonicity is common among the
running times of conventional algorithms, it is not hard to find examples
(more-or-less realistic) where it does not hold. For instance, an algorithm
may simply return, if its input exceeds a given size. Thus, we shall not, in
general, assume that running times are monotonic.
Furthermore, we shall extend our classification to arbitrary functions
f : R R, since there is no technical reason to consider only functions over
naturals. In support for this, we shall also add that asymptotic notations are
useful in other fields outside complexity theory, where the assumption that
functions are defined over natural numbers only is not justified.
Definition 3.1.2 ( (theta) notation) Let g : R R. Then (g(n)) is
the class of functions:
(g(n)) = {f : R R |
c1 , c2 R+
n n0 , c1 g(n) f (n) c2 g(n)}
n0 N
Thus, (f (n)) is the class of all functions with the same asymptotic
growth as f (n). We can easily observe that, for all continuous g, f
(n)
Hom(R, R) such that g (f (n)), we have limn fg(n)
= c, where c 6= 0.
There is an infinite number of classes (f (n)), one for each function f .
However, if g(n) (f (n)), then (g(n)) = (f (n)).
It makes sense to consider classes which describe functions with inferior /superior asymptotic growth:
Definition 3.1.3 (O, notations) Let f : R R. Then:
O(g(n)) = {f : R R |
c R+
n n0 , 0 f (n) c g(n)}
n0 N
(g(n)) = {f : R R |
c R+
n n0 , 0 c g(n) f (n)}
n0 N
c R+
n n0 , 0 f (n) c g(n)}
n0 N
(g(n)) = {f : R R |
c R+
n n0 , 0 c g(n) f (n)}
n0 N
Syntactic sugars
This section follows closely Lecture 2 from [1]. Quite often, asymptotic
notations are used to refer to arbitrary functions having certain properties
related to their order of growth. For instance, in:
df (x)e = f (x) + O(1)
applying rounding to f (x), may be expressed as the original f (x) to which
we add a function bounded by a constant. Similarly:
1
= 1 + x + x2 + x3 + (x4 ), for 1 < x < 1
1x
27
The above notation allows us to formally disregard the terms from the
expansion, by replacing them with an asymptotic notation which characterises their order of growth. One should make a distinction between the
usage of asymptotic notations in arithmetic expressions, such as the ones
previously illustrated, and equations. Consider the following example:
f (x) = O(1/x)
which should be read: there exists a function h O(1/x) such that f (x) =
h(x). Similarly:
f (x) = O(log x) + O(1/x)
should be read: there exist functions h O(1/x) and w O(log x) such that
f (x) = w(x) + h(x). In equations such as:
O(x) = O(log x) + O(1/x)
the equality is not symmetric, and should be read from left to right: for
any function f O(x), there exist functions h O(1/x) and w O(log x)
such that f (x) = w(x) + h(x). In order to avoid mistakes, the following
algorithmic rule should be applied. When reading an equation of the form:
lef t = right
each occurrence of an asymptotic notation in left should be replaced by
an universally quantified function belonging to the corresponding
class.
each occurrence of an asymptotic notation in right should be replaced
by an existentially quantified function from the corresponding class.
Consumed space in the TM, vs consumed space on algorithms....
Computability: what Complexity: how much
4.1
4.2
4.3
Complexity classes
4.4
4.5
4.6
Correctness
5.1
5.2
Loop invariants
5.3
Model checking
References
[1] A.J. Hildebrand.
Asymptotic methods in analysis
http://www.math.uiuc.edu/hildebr/595ama/.
Math595AMA, 2009.
29