Professional Documents
Culture Documents
Application
MP,CG,MDB, etc...
January 7, 2014
1.1
Notes
The understanding of any new concept requires, to some extent, a motivational object of study, to which it can be applied. Anatomy for instance,
requires the presence of the human body. However, the object of study itself
can be a concept, or conceptual in nature.
This section is dedicated to measuring space and time in Computer Science and as object of study, we have chosen Propositional Logic.
Propositional Logic
Argument Propositional Logic (PL) is firmly rooted in Computer Science. It is the most basic tool for describing and reasoning about knowledge, and the starting point for more complex frameworks used in Artificial
Intelligence. A formula from PL is quite simple in structure, however it is
highly flexible in that it can encode complex questions about numbers, sets,
graphs, and even algorithms. As it turns out, PL is also the form in which
these questions can be answered efficiently, as we will further see.
Propositional Logic Propositional Logic can be traced back to Plato1
which first argued that a statement can only be true or false. In what
follows, we will use > to refer to true and to refer to false. These two can
be seen as values of an algebra that is commonly known as Boolean Algebra.
1
>>>
>
>
(2)
(3)
(4) ,
According to this order, x y z should be interpreted as ((x y) z). In
what follows, we shall slightly deviate from our syntax definition and omit
parentheses whenever we find this convenient.
An interpretation is a formal means for assigning values to variables.
Definition 2.0.2 (Interpretation) An interpretation is a one-to-one (injective) function I : Vars {>, }. Given an interpretation I, variable
3
This name originates from the term model, which is used to formally designate an
instance of the Universe. In the case of PL , such an instance is the interpretation, where
each variable must be either > or . Other logics make use of more complicated models
such as graph-like structures, which may be used to describe the evolution of the Universe,
etc. The model checking problem is formulated in the same way, irrespective of how the
Universe is described.
Asymptotic notations
3.1
At this point, we have two algorithms for two different problems, and we are
interested to find the suitable means for (i) evaluating their performances,
and for (ii) comparison with other algorithms that solve the same problem.
We start by noticing that:
1. On a conventional computation machine, the resources which are spent
when running an algorithm are: (i) the consumed memory i.e. space
6
and (ii) the the number of instructions i.e. time it takes for the algorithm to run.
2. As a formula gets larger, more space (and time) are being consumed.
3. To properly evaluate an algorithm, the consumption of space and time
should always consider the worst-case scenario.
Point 1. should require no additional clarification. Let us focus on point 2.
First, we introduce the notion of PL formula size.
Definition 3.1.1 (PL formula size) The size of a PL -formula is recursively defined as:
|p| = 1 for p Vars()
|| = || + 1
|1 2 | = |1 | + |2 |
Exercise 3.1.1 (PL formula size) Compute the sizes of the PL formulae
from Example 2.0.1.
It is true that, as the size of a formula gets bigger, the number of steps
(time) and the amount of memory (space) consumed by Algorithm 2 will
be bigger. However, as computers get more powerful, we also get more
resources to throw on solving problems. The question we would like to settle,
is whether time and space are relevant as possible measures of an algorithm
performance ? Our answer is yes, and to argue for it, first, consider Moores
Law :
Remark 3.1.1 (Moores law [2]) The processing speed and memory of
computers doubles every two years.
Next, let us look at the size of the set S from Algorithm 2 which contains all possible interpretations of Vars(). Initially, we have only one
interpretation I. After each iteration of the second imbricated for loop,
the size of S doubles: for each existing interpretation I, we add I1 and I2
which interpret the new variable v. Starting from |S| = 1, the size of S
doubles |Vars()| times. Hence, at the end of the outermost for loop, we
have |S| = 2|Vars()| . Assume |Vars()| = 100, which is quite reasonable, if
we think about complex electronic circuits such as a motherboard. Then,
2100 = 1, 267, 650, 600, 228, 229, 401, 496, 703, 205, 376.
To better grasp the size of the number, if one could fold a sheet of paper
100 times, the thickness of the stack would be equivalent to the distance in
light-years to the farthest observable galaxy in the known Universe.
Finally, assume we have a machine which can execute the second outermost for loop from Algorithm 2, which takes 2100 iterations, in a reasonable
time. Moores law tells us, that if we want to add another new variable to
our formula , then it we have to wait two years to have a machine able to
execute 2101 = 2 2100 iterations, in a reasonable time.
To conclude, time and space are limited resources which we must take
into account, and the performance of any algorithm should be measured
with respect to these two metrics.
3.2
The -notation
Having settled the metrics we use, let us focus on the methodology of measurement. For now, we consider time only. The same discussion can be
naturally extended to space.
The number of steps executed by an algorithm is usually expressed as
T (n), where T is a function monotonically increasing function T : N N
and n measures the size of the input. Let us compute the number of steps
Tfor (n) of the first outer for, from Algorithm 2. Here, n will stand for the
number of variables of : |Vars()|.
We make the following assumptions:
1. each step consists of one unit of time. Hence, we will interchangeably
use the terms number of steps and (execution) time.
2. for each interpretation I : Vars {>, }, variable v Vars and value
b {>, }, building I[v b] takes |Vars| steps.
3. the set difference S \ S 0 and set reunion S S 0 takes |S| |S 0 | steps.
4. each assignment S = S 0 takes one step.
Example 3.2.1 (Execution time for Tfor (n)) Given the above assump-
tions, we obtain:
X
Tfor (n) =
0kn1 1l|S|at
+TX
TS=S 0 )
S 0 = + X
(n + n + (l 1) 2) + 1 + 1
=
0kn1 1l2k
=
=
2n 2k + 2
0kn1
X
2k (2k 1)
+2
2
(2n 1) 2k + 4k + 2
0kn1
= (2n 1)
2n 1
21
4n 1
41
+ 2n
c1 , c2 R+
n n0 , c1 g(n) f (n) c2 g(n)}
n0 N
Remark 3.2.1 (Arbitrary functions vs execution times) In our previous Definition 3.2.1, the function g is defined over reals, and there are no
additional assumptions, such as monotonicity, which are characteristic to
those functions which express execution times.
9
The set (g(n)) collects all functions f which have the same asymptotic
growth as g. This is expressed by the existence of some real constants c1
and c2 such that the inequality c1 g(n) f (n) c2 g(n) holds. Moreover,
the inequality must hold for all n larger than some fixed n0 . The behaviour
of f (n) (with respect to g(n)) is unimportant for those n less than n0 . This
condition stresses the importance of the behaviour to the limit of f (n), that
is, for n .
3.3
Syntactic sugars
10
+TX
S 0 = + TS=S 0 )
=
2k ((n) + (n) + (n)) + (1) + (1)
1kn1
1
= (n) 221
+ (1)
n
= (n) (2 ) + (1) = (n 2n )
3.4
Recurrences
if x
if 1 2
if 1
Tswitch is the time required to looking into the formula structure, and
identifying the proper case. If we assume a formula is represented as a
tree, then this time is (1), since it amounts to looking at the children
branches of the formula at hand. T denotes the time for multiplication,
and Tmodule & division that of dividing by 1 and applying the module on a
11
value. Both are (1). Next, we can replace the execution time of each
instruction, with asymptotic notations:
(1) + (1)
TM Cnaive (m) =
(1) + TM Cnaive (m1 ) + TM Cnaive (m2 ) + (1)
if x
if 1 2 ,
if 1
Finally, we note that the third case is a particular case of the second (in
which m1 = m 1 and TMCnaive (m2 ) = (1)). By replacing the formula
structure cases with the formula size, we get:
TM Cnaive (m) =
(1)
if m = 1
TM Cnaive (m1 ) + TM Cnaive (m2 ) + (1) m1 + m2 = m
Algorithms with recursive calls often fit the Divide et Impera(DI) pattern,
i.e. they solve a problem by dividing it into other smaller problems. This
is also the case with Algorithm 1. In general, the execution time of a DI
algorithm may be described as:
(1)
if m is sufficiently small
T (m) =
D(m) + T (m1 ) + T (m2 ) + C(m) otherwise
where:
the first condition corresponds to the case when the answer to the
problem of size m can be given directly, without a division step. The
problem is atomic in size, and cannot be further split.
T (m) is the execution time of solving a problem of size m
D(m) is the time required to split the problem into subproblems of
sizes m1 and m2 , respectively
T (m1 ) and T (m2 ) are the times for solving the smaller problems
12
m1 = |1 |,
m2 = |2 |
The substitution method relies on: (i) guessing the growth (or exact expression) of a recurrence, and (ii) proving the guess is correct, via mathematical
induction. The substitution method is more rigorous than building recurrence trees, since the latter may be prone to errors in both recurrence tree
construction, as well as in accounting of all components of the execution
time. It is often the case that recurrence trees are used as a guessing method,
while the substitution method is used to confirm the guess is correct.
Let us apply the substitution method to verify that:
Proposition 3.4.1 TMCnaive (m) = (m).
Proof: The basis case is TMCnaive (1) = (1), which is directly confirmed by
the first branch of the definition of TMCnaive . The induction hypotheses are
TMCnaive (m1 ) = (m1 ) and TMCnaive (m2 ) = (m2 ), with m1 + m2 = m. Our
aim is to prove TMCnaive (m) = (m). According to the second branch of the
definition of TMCnaive , we have TMCnaive (m) = TMCnaive (m1 ) + TMCnaive (m2 ) +
(1). Next, we use the induction hypotheses and obtain TMCnaive (m) =
(m1 + m2 ) + (1) = (m).
Remark 3.4.1 (Basis case) In applying mathematical induction in the
substitution method, the basis case m = 1 is less important. It may be the
case that our claim may not hold for m = 1. If this is so, one can attempt
to check the claim for m = 2, 3, . . . until the claim is found to hold. This
13
3.5
The notations O, , o,
less than or (ii) at least as much/strictly more than some given function.
We introduce notations for each of these four possibilities:
Definition 3.5.1 (O,o,, notations) Let f : R R. Then:
O(g(n)) = {f : R R |
c R+
n n0 , 0 f (n) c g(n)}
n0 N
(g(n)) = {f : R R |
c R+
n n0 , 0 c g(n) f (n)}
n0 N
o(g(n)) = {f : R R |
c R+
n n0 , 0 f (n) c g(n)}
n0 N
(g(n)) = {f : R R |
c R+
n n0 , 0 c g(n) f (n)}
n0 N
Remark 3.5.1 (Syntactic sugars for all notations) The syntactic sugars, previously defined for the notation, naturally extend to all other
notations. However, one should observe that equality is not a symmetric
relation. For instance nO(1) = O(en ), however O(en ) 6= nO(1) . Assume
O(en ) = nO(1) . By the definition of the syntactic sugars, we have that for all
f (n) O(en ) the exists g(n) O(1) such that f (n) = ng(n) . Let f (n) = en .
Then, from en = ng(n) , it follows that g(n) = logn n , but also g(n) O(1).
Contradiction.
3.6
Conclusion
We have established time (& space) as suitable metrics for evaluating and
comparing algorithms, and we have introduced asymptotic notations for expressing their growth instead of their actual expressions, which are tedious
to derive and not completely useful. We have established syntactic sugars
which offer a convenient way to handle notations. Also, we have described
three ways for determining the growth of execution times defined as recurrences. However, we have yet to compute the growth of the execution
time for Algorithm 2, that is SATnaive . TSATnaive can be naturally determined from Tfor (n), the execution time of the first part of the algorithm,
and TMCnaive (m), since M Cnaive is called in the second part of the algorithm. First note that the variable n refers to the number of variables from
the formula , while m refers to the formula size ||. Therefore, we define
TSATnaive as TSATnaive : N N N:
TSATnaive (n, m) = Tfor (n) + |S| TMCnaive (m)
= (2n ) + 2n (m)
= (2n m)
Thus TSATnaive is exponential in the number of variables of and linear in the
size of . There is also a link between the number of variables of a formula
m, and the formula size n, namely m is at most equal to n (there cannot
be more variables than the size of the formula), hence m = O(n). With
this observation, we can redefine TSATnaive in terms of n only: TSATnaive (n) =
(2n ) (1) m = (2n ) O(n) = O(n 2n ). However, we note that our
previous formulation is more precise.
Let us talk about the consumed space of each of our algorithms. In our
account, we will consider variables and data-structures used by the algorithm
only, and ignore the space occupied by the input data. We assume the space
consumed to store and I is || and |Vars()|, respectively. Thus, the
16
17
4.1
Foreword
4.2
LB
BoolList, Bool
VoidB : BoolList, True : Bool, False : Bool
insert : Bool BoolList BoolList
remove : BoolList BoolList
19
Apair
ABoolList , B
[]B , >,
ins : B ABoolList ABoolList
rm : ABoolList ABoolList
where the standard set B is the carrier of Bool, >, B interpret the
symbols True and False, respectively, while []B interprets the empty list constant symbol VoidB . The functions ins(b, l) = (b, l) and
0
l
if l = (b,l)
rm(l) =
[]B otherwise
interpret the operation symbols insert and remove. Finally, ABoolList which
is the set containing []B as well as all pairs whose first element is a B and
second element is a ABoolList , is the carrier set of BoolList.
Example 4.2.3 (Algebra 2) The algebra Adec for LB is defined as follows:
Algebra
Carrier sets
Constants
Operations
Adec
N, B
1, >,
f :BNN
g:NN
Aset
2B , B
, >,
i : B 2B 2B
r : 2B 2B
21
4.3
22
SBoolList
BoolList, Bool
VoidB : BoolList, True : Bool, False : Bool
insert : Bool BoolList BoolList
remove : BoolList BoolList
remove(insert(b, l)) = l
Remark 4.3.2 (Axioms) We have omitted the formal definitions of axioms, as well as axioms being true in an algebra. In Example 4.3.1, the concept of axiom should be straightforward. In the same example, note that we
have not mentioned who b and l are. Intuitively, they are universally quantified variables, over the sets that interpret Bool and BoolList, respectively. To
make the exposition more compact, we have omitted a more lengthy discussion regarding variables. The reader should note that our usage of axioms
is intuitive, and not entirely rigorous.
The axiom we introduced in Example 4.3.1 clearly rules out Aset from
the ADT. The question is, whether the ADT SBoolList , which contains Apair
and Adec , is indeed monomorphic. The name assigned to the second sort in
SLB , that is Bool, suggests that its carrier set must be the set of boolean
values. However, there is no axiom to enforce this. Consider the following
algebra:
Algebra
Carrier sets
Constants
Operations
A0pair
ABoolList , N
[]B , 1, 0
ins : N ABoolList ABoolList
rm : ABoolList ABoolList
where ins and rm have the same body definitions as in Example 4.2.3,
N is the carrier set of Bool, and the natural numbers 1 and 0 interpret the
constant symbols True and False. A0pair clearly embodies lists of natural
numbers, and is a model of SLB . Also, A0pair and Adec are not isomorphic
(since N and B are not isomorphic either). Thus, we require an axiom which
specifies that the carrier set of Bool can only contain two elements, namely
those that interpret True and False. The axiom is:
x : Bool(x = True x = False)
23
The previous axiom does not rule out the possibility that Bool may be
interpreted by a one-element set. To fix this, we add:
x : Bool.y : Bool(x 6= y)
The specification, as previously defined, is the natural way to define
ADTs, and thus to rule out those algebras which may be undesirable. Seen
as such, an ADT may be compared to OOP abstract classes. For instance,
SBoolList may be interpreted as a class where the insert method is not yet
defined, however remove is, based on the former. This very insight should
suggest that, we can very well program with ADTs just in the same way we
program in OOP (as is the case in Haskell). However, formally, an ADT can
be more expressive. Under our definition, it can, for example, specify that
f (x) < g(x), where f and g are operation symbols, and < is an alreadydefined order (2-ary operation symbol) over the appropriate sorts.
Remark 4.3.3 (Constructing specifications) Developing software modules (collections of interfaces, (abstract) classes, etc.) is so much similar to
developing appropriate ADTs. However, there are subtle but essential differences. If we return to our example regarding boolean lists, and we wish to
develop an abstract class for such an object, how do we appreciate the outcome we produce, i.e. the abstract class? We can neither claim it is good,
nor that it is bad.3
Abstract Data Types rid us of such ambiguity. In our example, we know
that SLB must be monomorphic. Such a claim can be verified using mathematical tools.
4.4
Note that, not all polymorphic ADTs are bad. Consider the ADT generated by the following specification:
3
In commenting a research article, Wolfgang Pauli says: This paper is so bad it is not
even wrong. The allusion is to the ambiguity of the claims made in the paper. They are
formulated in such a way they cannot be proved nor disproved.
24
Specification
Import
Sorts
Constant symbols
Operation symbols
SL
Bool, Nat
List, Element, Bool, Nat
Void : List
insert : Element List List
remove : List List
fst : List Element
isEmpty : List Bool
size : List Nat
append : List List List
Axioms
(R)
(F )
(I1)
(I2)
(S1)
(S2)
(A1)
(A2)
remove(insert(e, l)) = l
fst(insert(e, l)) = e
isEmpty(Void) = True
isEmpty(insert(e, l)) = False
size(Void) = 0
size(insert(e, l)) = 1 + size(l)
append(Void, l0 ) = l0
append(insert(e, l), l0 ) = insert(e, append(l, l0 ))
4.5
The ADT terminology, as presented so far, introduces sorts, constant symbols and operation symbols which are used to make abstract references to:
(carrier) sets, designated elements of such sets, and functions (or operations),
respectively. We have introduced variables (of certain sorts) in an ad-hoc
way, to pave the way for defining axioms. The latter have also been introduced informally. The reader may easily notice that the formal apparatus
for defining (and interpreting) axioms is First-Order Logic(FOL).
In what follows, we are interested in talking about arbitrary abstract
elements of the ADT, ones which would be interpreted by the algebra as
values (elements of some carrier set), but not just those which interpret
constants. For example, we use the constant symbol VoidB : LB to refer (in
an abstract way) to the empty list, that is, the symbol which may be
interpreted as 1 N, in the algebra Adec , as already seen. How do refer, in
the same abstract way, to (i) the list which contains true or (ii)the list
which contains false and true, in this particular order, etc. ?
26
27
Remark 4.5.1 (Exceptions) There are situations when building up certain terms (and subsequently applying certain operations in a given algebra)
do not make sense. For example, in the algebra Adec , by what number should
the term remove(Void) to be interpreted ? The same question can be reformulated as: what is the list represented by g(1) = b1/2c = 0 ?
This issue can be settled either by: (i) introducing special exception values, and treating them much in the way they are treated in OOP, or (ii)
develop a theory of working with only partially-defined functions (hence algebras). Throughout this chapter, we will opt for the first alternative. Thus,
SL becomes:
SL
Bool, Nat
List, Element, Bool, Nat
Void : List, ErrorL : List, ErrorB : Bool, ErrorE : Element,
ErrorN : Nat
Operation symbols insert : Element List List
remove : List List
fst : List Element
isEmpty : List Bool
size : List Nat
append : List List List
Specification
Import
Sorts
Constant symbols
Axioms
(R1)
(R2)
(R3)
(F 1)
(F 2)
(F 3)
(I1)
(I2)
(I3)
(S1)
(S2)
(S3)
(A1)
(A2)
(A3)
remove(insert(e, l)) = l
remove(Void) = ErrorL
remove(ErrorL ) = ErrorL
fst(insert(e, l)) = e
fst(Void) = ErrorN
fst(ErrorL ) = ErrorN
isEmpty(Void) = True
isEmpty(insert(e, l)) = False
isEmpty(ErrorL ) = ErrorB
size(Void) = 0
size(insert(e, l)) = 1 + size(l)
size(ErrorL ) = ErrorB
append(Void, l0 ) = l0
append(insert(e, l), l0 ) = insert(e, append(l, l0 ))
append(ErrorL , l) = append(l, ErrorL ) = ErrorL
The ADT SL generalises SLB in that lists can contain arbitrary elements
28
4.6
29
4.7
Structural induction is the fundamental instrument for proving ADT properties. Actually, structural induction is commonly used for proving statements
regarding any abstract mathematical object which is recursively defined (as
is the case with ADTs). Each term of an ADT is built starting from another term, using a constructor. For instance, the term insert(True, VoidB ),
that is, the list containing the element True, is built from the term VoidB ,
using the constructor insert. Let T be the set of all terms built using the
base constructors of an ADT S and P : T {0, 1} be a property of elements of T . Structural induction is used to prove statements of the form
t T.P (t) = 1. To understand structural induction, it is useful to consider
it as a generalization of mathematical induction. The latter is used to prove
statements of the form n N.P (n) = 1, where P is a property of natural
numbers. The mathematical induction schema is as follows:
P (0) = 1
P (n)=1
P (n+1)=1 (n
N)
n N.P (n) = 1
A
is interpreted
B
as A = B, thus if A is true, then B is true. It can be read as: if (*)
P is true for the value 0 (P (0) = 1) and (**) if we assume P is true for
value n, then we can show P is true for the value n + 1, for all values n N
P (n) = 1
(
(n N)), then P is true for all values n N. (*) is called
P (n + 1) = 1
the basis case. (**) is called the induction step, and the assumption
P is true for value n from (**) is called induction hypothesis.
We can imagine the natural numbers as values associated to terms which
are built by the constructors of an ADT SN , given by (the incomplete specThe schema relies on the notation convention by which
30
ification):
Specification
Sorts
Constant symbols
Operation symbols
Axioms
SN
Naturals
Zero : Naturals
Succ : Naturals Naturals
...
Thus, the (unique) basis case of the mathematical induction, which addressed the value n = 0, will generalize in the structural induction to a set of
basis cases, one for each: (i) constant symbol and (ii) external constructor.
Therefore, the basis case will address those terms which are not built using
other terms. The (unique) induction step of the mathematical induction,
which addressed the values n + 1 constructed from n by applying Succ, will
generalize in the structural induction to a set of induction steps, one for
each internal constructor.
Let 0 , e , i designate the set of nullary, external and internal constructors of an ADT. The structural induction schema is given by:
P (0 ) = 1.0 0 P (e (. . .))e e
P (t)=1
P (i (...,t,...))=1 (t
T, i i )
t T.P (t) = 1
5
5.1
Complexity Theory
The Turing Machine
The Turing Machine was developed in the 1930s by Allan Turing. The purpose of the Turing Machine was to define mechanical computation, in an
age where computers and the related technology were still in their infancy.
Seen from our time, we can imagine the Turing Machine as a programming
language for pen and paper, stripped to its very basics. But how can such
a language still be useful after more than 80 years, when a great variety
of powerful languages are available? The two key features of the Turing
Machine are (i) simplicity: The Turing Machine has a unique type of instruction: one that reads a character in a given state, and depending on the
read character transitions in another state, possibly modifying the current
character and moving to the next one. This is somewhat similar to gotos.
There are no fors, no explicit ifs, not even functions; we had a similar
approach related to the Propositional Logic, where we have defined a syntax and semantics for the operators and only, for the economy of the
31
definition. All other operators can be defined in terms of the former; (ii)
expressiveness: as simple as it is, the Turing Machine can express any form
of computation, no matter how complex. Thus, any C/C++, Java, etc. program can be expressed as a Turing Machine. Naturally, the corresponding
Turing Machine would be quite complicated and difficult to develop from
scratch. However, the purpose of the Turing Machine is not to be used as a
programming language. Rather, it can be seen as an abstract specification
of an algorithm (analogy which will be used throughout the course notes),
which is further on used to examine how much resources are consumed when
solving a problem.
In what follows, we switch to a more formal definition of these intuitive
concepts.
Definition 5.1.1 (Problem instance) A problem instance is a mathematical object of which we ask a question and expect an answer.
Example 5.1.1 (Problem instance) Consider the (directed) graph G =
(V, E) where V = {a, b, c, d} and E = {(a, b), (b, c), (c, a), (d, a)}. The question we would like to ask is There exists a loop in G ? The graph G of which
we ask the former question is a problem instance.
Definition 5.1.2 ((Decision) problem) A problem is a mapping P : I
O where I is a set of problem instances of which we ask the same question,
and O is a set of answers. P assigns to each problem instance i I the
answer P (i) O. If O = {0, 1} we say that P is a decision problem i.e.
one whose answer is yes/no (or 0/1).
In the rest of the chapter, we shall discuss decision problems only.
Remark 5.1.1 (Problem vs. algorithm) An algorithm is a specification
of a finite computation process, which takes an encoding of a problem instance and produces an output. If the problem is a decision problem, then
the output is 0/1.
The above statement is an intuitive description of an algorithm. In what
follows, we give a more formal definition. An algorithm is a Turing Machine.
Definition 5.1.3 ((Deterministic) Turing Machine) A (Deterministic)
Turing Machine (abbreviated (D)TM) is a tuple M = (K, F, , , s0 ) where:
= {a, b, c, . . .} is a finite set of symbols which we call alphabet;
32
33
start
s0
1/0, L
#/#, L
s1
0/1, H
> /1, H
s2
different functions which had the same asymptotic growth. Thus, n2 and
n2 +5n+6 were essentially seen as indistinguishable, and denoted as (n2 )5 .
In complexity theory, we will make a step forward and abandon the
distinction between any functions which grow as fast as any polynomial.
Under this assumption, execution times such as n2 , n5 or n2 log2 n will be
seen as indistinguishable. We shall take this intuition to the formal level
later on, when we introduce complexity classes.
One may argue that this assumption is overly simplistic. In practice,
for large enough values of n, n2 is an acceptable running time, while n5 is
not. This is indeed true. However, even under this apparently rudimentary
assumption, we can achieve an insightful classification of problems. This
classification deems a considerable number of interesting problems as practically unsolvable due to excessively high running times. We shall defer
further comments for the following sections, and return to the issue of the
encoding. This is addressed by the following proposition:
Proposition 5.1.1 (The encoding does not matter) Every decision
problem Q : I {0, 1} which is solved by a Turing Machine M in time
T (n), with alphabet (which is used for encoding problem instances from
I), is also solved in time O(log ||) T (n) using the alphabet {0, 1, #, >}.
Proof:(sketch) We build M 0 = (K 0 , F 0 , 0 , 0 , s00 ), from M as follows:
0 = {0, 1, #, >}. We encode each symbol different from # (the empty
cell) and > (the marker symbol of the beginning of the input), as a
word w 0 with |w| = dlog ||e. We use k to refer to the length |w|
of the word w. We write enc0 (x), with x , to refer to the encoding
of symbol x .
For each state s K, we build 2k+1 states q1 , . . . q2k+1 K 0 , organized
as a full binary tree of height k. The purpose of the tree is to recognize
the word enc0 (x) of length k from the tape. Thus, the unique state
at the root of the tree, namely q1 , is responsible for recognising the
first bit. If it is 0, M 0 will transition to q2 and if it is 1, to q3 . q2 and
q3 must each recognize the second bit. After their transitions, we shall
be in one of the states q4 to q8 , which give us information about the
first two bits of the word. The states from level i recognize the first
i bits of the encoded symbol enc0 (x). The states from the last level
are 2k in number, and recognize the last bit of the encoded symbol
5
or equally (2 n2 ), etc.
36
39
5.2
Decidability Theory
This chapter investigates the limits of the Turing Machine. More precisely,
we are interested in (i) examining whether there are problems which are
not solvable by any Turing Machine and (ii) whether there exists a computational model, more powerful than the Turing Machine, which is able to
solve such problems. As it turns out, the answer to (i) is yes: some problems
can only partially be solved by a TM, and some cannot be solved at all.
The answer to (ii) is formulated as the Church-Turing conjecture (or thesis),
which states that any computational model is as expressive as the Turing
Machine. Computability theory is actually rich in computational models.
Some of the most wide-spread are: the lambda calculus, while programs,
RAM machines (very similar to assembly programs), Markov (or normal)
algorithms. All of them are shown to be equivalent with the TM. More
precisely, each lambda-expression, while or RAM program and Markov algorithm, is equivalent with a Turing Machine in the sense that they solve the
same problem. The Church-Turing thesis is formulating an even stronger
claim: any computational model (not just the existing ones) is equivalent to
the Turing Machine.
In what follows, we provide rigorous formulations for solvable and partially solvable problems, and construct a taxonomy of problems, based on
Turing Machine solvability. We provide a technique for establishing TM
solvability, and end with formulating the Church-Turing thesis.
To begin with, we refine Definition 5.1.2 of a decision problem. Recall
from Remark 5.1.4, that any problem instance can be encoded as a word
over b .
Proposition 5.2.1 b and N are isomorphic.
Proof: We prove this result for 01 = {0, 1}. 01 and N are isomorphic iff
there exists a bijective function h : 01 N. Intuitively, h assigns for each
word in 01 a unique number from N. Note that each word wn 01 is a
sequence of bits which can be seen as a binary representation of a natural
number n N. Thus, h is the transformation function from the binary
base to the decimal one. It is straightforward that our choice of h is an
isomorphism (each binary word corresponds to exactly one natural, and for
each such word there exists exactly one corresponding natural). The same
result can be extended from 01 to b , by assuming we have 4 bits instead
of two.
Proposition 5.2.1 takes us a step forward. We can represent (or interpret)
each problem instance, that is, a word from b , as a natural number. With
40
42
43
. . . wk . . .
... ...
... 1 ...
... 0 ...
... ... ...
... 1 ...
... ... ...
The values 0, 1, (does not halt) which populate the table are selected
arbitrarily, for description purposes.
The Turing Machine M must be some Mi , that is, the i-th Turing Machine. Let us look at the output of Mi , given input wi (the encoding of i
in binary). Mi (wi ) = f (i) since Mi decides f . Assume Mi (wi ) = 1. Then
f (i) = 0, according to the definition of f . Thus Mi (wi ) 6= f (i). Contradiction. Assume Mi (wi ) = 0. Then f (i) = 1. Contradiction. Finally, we
also obtain a contradiction for Mi (wi ) = (Mi does not halt for wi ). Thus,
f 6 R.
Definition 5.2.4 (Halting problem) Let:
0 Mn (wn ) does not halt
fhalt (n) =
1 Mn (wn ) halts
44
fhalt is called the halting problem: it takes as input the encoding n of a word
as well as Turing Machine Mn , and establishes if Mn halts for input wn .
This problem is also related to program correctness. A program is considered
totally correct if it halts for all its inputs, and the answer is the expected
one. If we cannot establish halting, but the result is always the correct one,
we say a program is partially correct.
Proposition 5.2.8 The halting problem is recursively-enumerable (semidecidable).
Proof: The proof requires two steps. The first consists of showing fh RE,
which is done by building a generator G for Afh . G simulates the TM Mn
on input enc(n), for a fixed number of k steps. The behaviour is similar to
GEN. If Mn halts, then n is added to Afh .
The second step is showing fh 6 R. This is achieved using contraposition.
We assume fh R and denote by M h the TM which decides fh . We now
show we can use M h to build a TM which solves the problem f , from the
proof of Proposition 5.2.7. The behaviour of the TM is as follows. First, we
run M h (enc(n)). If the output is 0, then Mn (enc(n)) does not halt. Thus,
we output 0. If the output of M h (enc(n)) is 1, then we know Mn (enc(n))
halts. Thus, we need to simulate Mn (enc(n)). Henceforth, we will output
1/0 depending on the output Mn (enc(n)). We have shown that f R,
which contradicts Proposition 5.2.7.
Proposition 5.2.9 R 6= RE.
Proof: We have fh in RE but fh 6 R.
Proposition 5.2.9 gives us a strong result: that is, there are problems
which cannot be decided by the Turing Machine. However, are these problems of interest, or do they correspond to bizarre functions such as f from
the proof of Proposition 5.2.7 ?
In what follows, we state one of the strongest results of this section. This
is formulated as the Rice theorem, given below:
Theorem 5.2.1 (Rice) Let C RE, C 6= , M be a Turing Machine and
f : N {0, 1} be the decision problem accepted by f . Establishing whether
f C is in RE.
Proof:
This proposition may be hard to grasp at a first lecture. We interpret
it as follows: C is a subset of RE, thus a set of recursively-enumerable
45
5.3
Complexity classes
In the previous section, we have classified problems into two classes, namely
R and RE. The first class contains decidable problems, the second, semidecidable problems. This classification relies on solvability. In what follows,
we look at problems from R only and further refine our classification, by
looking at how much time/space is consumed in order to solve such problems.
First, we introduce the non-deterministic Turing machine.
46
5.3.1
The keen reader may notice that the Towers of Hanoi is not a decision problem.
Thus, the comparison with SAT and Hamiltonian path may not be technically the best
one. However, other decision problems which take exponential time and do not require
building, in an independent manner (see (ii)) all candidates, are less intuitive and not
simple to present.
47
note that this analogy is purely intuitive, and no actual parallelism occurs
in the case of NTM.
Definition 5.3.1 (Non-deterministic TM) A non-deterministic TM is
a TM M = (K, F, , , s0 ) over alphabet where K is a relation.
Example 5.3.1 (Non-deterministic TM)
Remark 5.3.1 (DTM vs NTM) The single difference between a DTM
and a NTM is in the way is defined. In the case of the DTM, the machine
performs a unique transition, to another state. In the case of NTM, such
a transition is not necessarily unique. Thus, a NTM can be in a number
of configurations at the same time. Each such configuration, including the
state and the contents of the tape, is independent of all others.
As a direct consequence, the execution of a DTM can be represented as
a linear (not necessarily finite) sequence of configurations, starting from the
initial one. In contrast, the execution of a NTM can be represented as a
tree.
Convention for describing NTM in pseudocode. As previously
stated it is more convenient to describe the behaviour of Deterministic
Turing Machines using pseudocode. We extend this convention for the NTM
as well. For this, we introduce the instruction:
v = choice(A)
where v is a variable and A is a set of values. The instruction has the
following behaviour:
the configuration of the NTM is replicated |A| times. Thus, we end
up with |A| configurations, one for each value in A.
in each configuration replica, the variable v has one distinct value from
A.
each configuration replica will continue to execute independently from
all others, in a fashion which mimics parallelism.
all configurations are independent of each other (in other words, program copies cannot communicate between themselves).
Example 5.3.2 (Pseudocode)
48
5.4
EXPTIME = dN DTIME(2n )
d
NEXPTIME = dN NTIME(2n )
We present without proving the following relationship between these
classes7 :
P NP PSPACE = NPSPACE8 EXPTIME NEXPTIME
We call this relationship the hierarchy of complexity classes. The hierarchy is, at least partially, intuitive since, for instance, any problem which
can be solved in at most polynomial time by a DTM (hence in P), can also
be solved in at most exponential time by a DTM (hence in EXPTIME ).
Similarly, any problem which can be solved in at most polynomial time by
a DTM (hence in P) can also be solved in at most polynomial time by a
NTM (hence in NP ), since the DTM is a particular case of a NTM (functions of the DTM definition are particular cases of relations of the NTM
definition).
The task of the complexity theorist is to study the relationship between complexity classes, and yield insightful results such as PSPACE =
NPSPACE (which we shall not discuss in more detail).
The task of the (applied) computer scientist is to establish in which
complexity classes do his problems naturally belong to, and by this, settle
the hardness of the studied problems. This task is actually two-fold:
first, it involves deciding membership to a class. This is achieved by
devising an algorithm (formally a DTM), and examining its running
time. For instance, SAT EXPTIME, since it can be solved in exponential time. However, membership to a class offers only an upper
bound of the problem hardness. For instance, deciding whether a vector is sorted can also be performed in exponential time on some DTM,
however there is also a faster way to solve this problem.
7
We note that these are only a small subset of the known and studied complexity
classes
8
Due to Savitchs Theorem
50
As we shall see further on, settling such an issue is not yet known.
Actually, F can be interpreted as a Turing Machine in itself, which takes encodings of
members of I1 and transforms them to encodings of members of I2
10
51
The P 6= NP problem
The class NP can be intuitively characterized as the set of all problems for
which the verification of a solution candidate can be done in polynomial
time (the generation of solution candidates is done in polynomial time, using
choice). If P = NP, then, each problem (denoted f ) such that solution
candidates can be checked in polynomial time (f NP), can also be solved
in polynomial time f P . Thus P = NP implies that solving a problem
is as easy as checking if a candidate-solution for the problem is the correct
one.
We also add the following properties regarding the set of NP -hard /
complete problems:
Proposition 5.4.5 The set of NP-hard problems is closed under p .
Proof: Let Q be an NP -hard problem and Q p Q0 . Then Q0 is also NP hard.
Proposition 5.4.6 The set of NP-complete problems together with p is
an equivalence class.
Proof: Let Q, Q0 be two NP -complete problems. Hence for all Q NP,
Q p Q. In particular, since Q0 NP, Q = Q0 , hence Q0 p Q.
Proposition 5.4.6 highlights an intrinsic feature of NP -complete problems, namely that they are equivalent up to a polynomial transformation.
With this in mind, we end with the following property:
Proposition 5.4.7 Assume Q is an NP-complete problem. If Q P , then
P = NP.
Proof: Let M be the TM which solves Q in polynomial time. Since Q is
NP -complete, Q p Q, for all Q N P . Hence, we can solve Q by: (i)
53
References
[1] Ulrich Berger. CS 376 Programming with Abstract Data Types. Department of Computer Science, Swansea University, 2009.
[2] G.E. Moore. Cramming more components onto integrated circuits. Proceedings of the IEEE, 86(1):8285, 1998.
54