You are on page 1of 54

Algorithms and Complexity Theory - Problems and

Application
MP,CG,MDB, etc...
January 7, 2014

Evaluating the performance of algorithms

1.1

Notes

The understanding of any new concept requires, to some extent, a motivational object of study, to which it can be applied. Anatomy for instance,
requires the presence of the human body. However, the object of study itself
can be a concept, or conceptual in nature.
This section is dedicated to measuring space and time in Computer Science and as object of study, we have chosen Propositional Logic.

Propositional Logic

Argument Propositional Logic (PL) is firmly rooted in Computer Science. It is the most basic tool for describing and reasoning about knowledge, and the starting point for more complex frameworks used in Artificial
Intelligence. A formula from PL is quite simple in structure, however it is
highly flexible in that it can encode complex questions about numbers, sets,
graphs, and even algorithms. As it turns out, PL is also the form in which
these questions can be answered efficiently, as we will further see.
Propositional Logic Propositional Logic can be traced back to Plato1
which first argued that a statement can only be true or false. In what
follows, we will use > to refer to true and to refer to false. These two can
be seen as values of an algebra that is commonly known as Boolean Algebra.
1

However, decomposing a statement into simpler atomic statements, connected by or,


and would come much later. This is attributed to .

>>>
>
>

> > > >


> > >
>>

Figure 2.0.1: Boolean algebra


The latter also specifies operations such as (negation), (and) and (or)
which can be applied to values. The operations are specified in Figure 2.
As it turns out, certain operations can be defined with respect to others, as
de Morgan laws state:
(x y) (x y) (x y) (x y)
where both x and y can be either > or . More complex operations such
as (implication) or (double implication, also denoted as if and only if
(iff )) can be defined with respect to simpler ones:
(x y) (x y) (x y) (x y) (y x)
Boolean Algebra and Propositional Logic are different. The former defines values and operations on values. The latter (and in general, all kinds
of logics) provides a means for writing statements, (i.e. a syntax ), and a
means for deriving the truth value of a sentence, in a given context, (i.e. a
semantics). The semantics of PL is connected to Boolean Algebra, as we
will further see. Consider, for comparison, the statement x + y > 5. If x
is taken to be 2 and y is taken to be 4, the statement is true, according to
standard algebra: 2 + 4 > 5 6 > 5. Such a statement is called satisfiable,
since there are values for x and y under which it is true. Not all statements
are satisfiable, for instance x + y < 0, where x and y are natural numbers.
Also, some statements are universally true (of valid ), i.e. they are true in
any possible variable assignment. x + y 0 is valid, if x and y are naturals.
In what follows, we formally introduce the syntax and semantics of PL.
At the heart of PL , we have variables. A variable is a symbol which can
be either > or . We denote variables by x, y, z, etc.
Definition 2.0.1 (PL syntax) Let Vars denote a finite set of variables
and v Vars. The syntax of PL is recursively defined as follows:
, ::= v | | ( )

Whenever = v, we say is an atomic formula. We make a slight abuse


of notation and designate by PL the set of formulae which can be built according the above rule. We also write Vars() to refer to the set of variables
used in formula .
Additionally, can be defined based on conjunction and negation: ()
( ). and can be defined similarly. In the definition of a
logical language, it is usual that certain operators (in this case , , )
are omitted, if they can be expressed based on existing operators.
Example 2.0.1 (PL formula) Let Vars = {jr, jf, mr, mf}. jr stands for
John is at the restaurant, jf stands for John is at the fast-food, mr stands
for Mary is at the restaurant and mf stands for Mary is at the fast-food.
1 = ((jr mr) (mr jr))
2 = (((jr (jr mr)) (mr jf)) (jr jf))
Formula 1 expresses that John is at the restaurant if and only if Mary is at
the restaurant. Formula 2 expresses that (i) John is at the restaurant (ii)
if John is at the restaurant then Mary is at the restaurant (iii) if Mary is at
the restaurant, then John is at the fast-food, but also that (iv) John cannot
be at the restaurant and the fast-food at the same time.
Remark 2.0.1 (Parentheses) As seen in Example 2.0.1, parentheses can
be quite tedious and make bigger formulae illegible. Without parentheses
however, formulae such as x y z may be ambiguous, since they can be
interpreted both as (x (y z)) and as ((x y) z). To avoid the parenthesis
overhead, we impose the following precedence order over the evaluation of
boolean operators:
(1)

(2)

(3)

(4) ,
According to this order, x y z should be interpreted as ((x y) z). In
what follows, we shall slightly deviate from our syntax definition and omit
parentheses whenever we find this convenient.
An interpretation is a formal means for assigning values to variables.
Definition 2.0.2 (Interpretation) An interpretation is a one-to-one (injective) function I : Vars {>, }. Given an interpretation I, variable
3

x Vars and value b {>, }, we write I[x b] to refer to the interpretation



I(v) if v 6= x
I[x b](v) =
b
if v = x
In effect, I[x b] is the interpretation I where the value of x is replaced
by b. Given an interpretation I, we write v =I b if I(v) = b. When the
interpretation is clear from context, we omit the subscript.
Definition 2.0.3 (PL semantics) The semantics of PL is a relation |=
which establishes, for each formula in PL built with variables from Vars,
and each interpretation I : Vars {>, }, whether the formula is true
under the interpretation at hand, as follows:
I |= v if I(v) = >
I |= if it is not the case that I |=
I |= if I |= and I |=
Whenever I |= we say that I makes true.
Example 2.0.2 (Semantics) We continue Example 2.0.1. Let I be the
interpretation such that jr =I , jf =I >, mr =I , mf =I >. I corresponds to the situation where John and Mary are both at the fast-food. Then
I |= 1 however I 6|= 2 .
There are two central problems related to PL , and these problems extend to more complicated logics as well. The first is the verification problem
(or, the model checking problem 2 ) which takes a PL formula and an interpretation, and asks if the formula is true under the given interpretation.
The second is the satisfiability problem which takes a PL formula and asks
whether there exists an interpretation under which it is true.
Definition 2.0.4 (Verification) The verification or model checking problem M C(, I) takes a PL -formula and an interpretation I and establishes
whether I |= .
2

This name originates from the term model, which is used to formally designate an
instance of the Universe. In the case of PL , such an instance is the interpretation, where
each variable must be either > or . Other logics make use of more complicated models
such as graph-like structures, which may be used to describe the evolution of the Universe,
etc. The model checking problem is formulated in the same way, irrespective of how the
Universe is described.

Definition 2.0.5 (Satisfiability) The satisfiability problem SAT () takes


a PL formula and establishes whether there exists an interpretation I such
that I |= . Whenever SAT () = 1 for some formula , we say is
satisfiable. If SAT () = 0 then is unsatisfiable.
Example 2.0.3 (Verification and Satisfiability) Given the interpretation I from Example 2.0.2 and formulae 1 and 2 , we have M C(1 , I) = 1,
SAT (1 ) = 1 and SAT (2 ) = 0. 1 is indeed satisfiable since I makes it
true. 2 is unsatisfiable since (i), (ii), (iii) (see Example 2.0.1) imply that
John is both at the restaurant and the fast-food, which is not possible according to (iv).
A natural question is whether the verification and satisfiability problems
are equally hard. Actually, this question will become central to our exposition, in the chapters that follow. Intuition suggests that verification is much
easier than satisfiability, since, for its case, the interpretation to check is
given in advance. For now, we will stick to finding algorithms that can solve
each problem. Consider the following algorithm for the verification problem:
Algorithm 1: MCnaive (,I)
Data: PL formula and interpretation I
Result: 1 if I |= and 0 otherwise
switch do
case x
return I(x)
case 1 2
return M Cnaive (1 , I) M Cnaive (2 , I)
case 1
return |M Cnaive (1 , I) 1|
endsw
Notice how it closely follows the definition of the PL semantics |=.
Remark 2.0.2 (Representation) In what follows, the presented algorithms
abstract from how a PL formula or an interpretation are represented, in a
program implementation. For now, we can consider that variables from Vars
are represented as integers from 0 to |Vars1|, I is represented as a (boolean)
vector where I[i] = b if variable i has value b, and that PL formulae are represented as binary trees where leaves are integers encoding variables and all
other intermediary nodes represent operators such as and .
Later on, we will encounter a more sophisticated means for encoding
structures such as PL formulae.
5

Exercise 2.0.1 Construct the trees associated to formulae 1 and 2 from


Example 2.0.1.
Now, consider the following algorithm for the satisfiability problem:
Algorithm 2: SATnaive ()
Data: PL formula
Result: 1 if is satisfiable and 0 otherwise
Let I : Vars() {>, } be an arbitrary interpretation;
S = {I};
for v Vars() do
S 0 = ;
for I S do
I1 = I[v >];
I2 = I[v ];
S 0 = S 0 {I1 , I2 };
end
S = S0;
end
for I S do
if MCnaive (, I)=1 then
return 1
end
return 0
One may notice that Algorithm 2, contains a brute-force procedure which
builds all possible interpretations of the variables of and applies the verification algorithm for each one. The question is whether we can do better.
This question, answered thoroughly, is also a central point of the lecture,
but for the time being, we shall skip it.

Asymptotic notations

3.1

What metric(s) do we use?

At this point, we have two algorithms for two different problems, and we are
interested to find the suitable means for (i) evaluating their performances,
and for (ii) comparison with other algorithms that solve the same problem.
We start by noticing that:
1. On a conventional computation machine, the resources which are spent
when running an algorithm are: (i) the consumed memory i.e. space
6

and (ii) the the number of instructions i.e. time it takes for the algorithm to run.
2. As a formula gets larger, more space (and time) are being consumed.
3. To properly evaluate an algorithm, the consumption of space and time
should always consider the worst-case scenario.
Point 1. should require no additional clarification. Let us focus on point 2.
First, we introduce the notion of PL formula size.
Definition 3.1.1 (PL formula size) The size of a PL -formula is recursively defined as:
|p| = 1 for p Vars()
|| = || + 1
|1 2 | = |1 | + |2 |
Exercise 3.1.1 (PL formula size) Compute the sizes of the PL formulae
from Example 2.0.1.
It is true that, as the size of a formula gets bigger, the number of steps
(time) and the amount of memory (space) consumed by Algorithm 2 will
be bigger. However, as computers get more powerful, we also get more
resources to throw on solving problems. The question we would like to settle,
is whether time and space are relevant as possible measures of an algorithm
performance ? Our answer is yes, and to argue for it, first, consider Moores
Law :
Remark 3.1.1 (Moores law [2]) The processing speed and memory of
computers doubles every two years.
Next, let us look at the size of the set S from Algorithm 2 which contains all possible interpretations of Vars(). Initially, we have only one
interpretation I. After each iteration of the second imbricated for loop,
the size of S doubles: for each existing interpretation I, we add I1 and I2
which interpret the new variable v. Starting from |S| = 1, the size of S
doubles |Vars()| times. Hence, at the end of the outermost for loop, we
have |S| = 2|Vars()| . Assume |Vars()| = 100, which is quite reasonable, if
we think about complex electronic circuits such as a motherboard. Then,
2100 = 1, 267, 650, 600, 228, 229, 401, 496, 703, 205, 376.

To better grasp the size of the number, if one could fold a sheet of paper
100 times, the thickness of the stack would be equivalent to the distance in
light-years to the farthest observable galaxy in the known Universe.
Finally, assume we have a machine which can execute the second outermost for loop from Algorithm 2, which takes 2100 iterations, in a reasonable
time. Moores law tells us, that if we want to add another new variable to
our formula , then it we have to wait two years to have a machine able to
execute 2101 = 2 2100 iterations, in a reasonable time.
To conclude, time and space are limited resources which we must take
into account, and the performance of any algorithm should be measured
with respect to these two metrics.

3.2

The -notation

Having settled the metrics we use, let us focus on the methodology of measurement. For now, we consider time only. The same discussion can be
naturally extended to space.
The number of steps executed by an algorithm is usually expressed as
T (n), where T is a function monotonically increasing function T : N N
and n measures the size of the input. Let us compute the number of steps
Tfor (n) of the first outer for, from Algorithm 2. Here, n will stand for the
number of variables of : |Vars()|.
We make the following assumptions:
1. each step consists of one unit of time. Hence, we will interchangeably
use the terms number of steps and (execution) time.
2. for each interpretation I : Vars {>, }, variable v Vars and value
b {>, }, building I[v b] takes |Vars| steps.
3. the set difference S \ S 0 and set reunion S S 0 takes |S| |S 0 | steps.
4. each assignment S = S 0 takes one step.
Example 3.2.1 (Execution time for Tfor (n)) Given the above assump-

tions, we obtain:
X
Tfor (n) =

0kn1 1l|S|at

(TI1 =I[v>] + TI2 =I[v] + TS 0 =S 0 {I1 ,I2 } )


step k

+TX
TS=S 0 )
S 0 = + X
(n + n + (l 1) 2) + 1 + 1
=
0kn1 1l2k

=
=

2n 2k + 2

0kn1
X

2k (2k 1)
+2
2

(2n 1) 2k + 4k + 2

0kn1

= (2n 1)

2n 1
21

4n 1
41

+ 2n

Example 3.2.1 shows that computing execution times is a tedious task,


even for a relatively simple procedure. Also, one may note that Tfor (n) contains a little to much detail which is unimportant for our purposes. Instead
of knowing the exact number of steps T (n) for each input size n, we would
rather be interested in knowing how fast the value T (n) grows, as n takes
values from N. Consider the following example:
Example 3.2.2 (Order of growth) Given execution times TA (n) = 20n+
6 and TB (n) = n2 + 5n + 4 of algorithms A and B respectively, we are more
interested in the fact that A runs in linear time (n) with respect to the input, while B runs in quadratic time (n2 ). The rest of the details are less
important.
To conclude, we are interested in a means for stating the order of growth
or asymptotic growth of a function T , instead of computing the exact expression of T . The advantage of the former is that, beyond capturing the
information we need, it is much easier to operate with.
Definition 3.2.1 ( (theta) notation) Let g : R R. Then (g(n)) is
the set of functions:
(g(n)) = {f : R R |

c1 , c2 R+
n n0 , c1 g(n) f (n) c2 g(n)}
n0 N

Remark 3.2.1 (Arbitrary functions vs execution times) In our previous Definition 3.2.1, the function g is defined over reals, and there are no
additional assumptions, such as monotonicity, which are characteristic to
those functions which express execution times.
9

The set (g(n)) collects all functions f which have the same asymptotic
growth as g. This is expressed by the existence of some real constants c1
and c2 such that the inequality c1 g(n) f (n) c2 g(n) holds. Moreover,
the inequality must hold for all n larger than some fixed n0 . The behaviour
of f (n) (with respect to g(n)) is unimportant for those n less than n0 . This
condition stresses the importance of the behaviour to the limit of f (n), that
is, for n .

3.3

Syntactic sugars

Also, the set (g(n)), can be interpreted (intuitively) as an anonymous


function, which has an order of growth equal to g(n). Thus, instead of
giving the actual expression of Tfor (n), we may, informally, say that Tfor (n) =
(n 2n ). This is equivalent to saying Tfor (n) is a function that grows
asymptotically in the same way as n 2n .
Remark 3.3.1 (Syntactic sugars) The statement Tfor (n) = (n 2n ) is,
formally, incorrect. The left-hand-side denotes a function, while the righthand-side denotes a set of functions. A correct statement would be Tfor (n)
(n 2n ). However, the former statement has a nice flavour, since Tfor (n)
is intended to designate a particular function (of unknown exact expression,
but with a known asymptotic behaviour). This is especially useful if Tfor (n)
appears in an equational expression. This motivates the introduction of the
following syntactic sugars.
Definition 3.3.1 (Syntactic sugars) The following are rewrite or syntactic sugar rules.
Expressions of the type T (n) = (f (n)) are syntactic sugars for T (n)
(f (n));
Expressions of the type
E((f1 (n)), . . . , (fk (n))) = E 0 ((f10 (n)), . . . , (fl0 (n)))
where E and E 0 are algebraic expressions which may contain other
functions as well, are syntactic sugars for:
T1 (n) (f1 (n)) . . . Tk (n) (fk (n))
T10 (n) (f10 (n)) . . . Tl0 (n) (fl0 (n))
E(T1 (n), . . . , Tk (n)) = E 0 (T10 (n), . . . , Tl0 (n))

10

Example 3.3.1 (Syntactic sugars) Consider the statement n2 + (n) =


(n2 ). This statement is indeed true. It should be read as: for all f (n)
(n) there exists f 0 (n) (n2 ) such that n2 + f (n) = f 0 (n).
Exercise 3.3.1 (Syntactic sugars) Prove that n2 + (n) = (n2 ). Also,
establish whether (log(n))2 + n (log(n)) = (n2 ).
Let us now establish the asymptotic growth of Tfor , instead of its precise
expression.
Example 3.3.2 (Asymptotic growth for Tfor )
X
Tfor (n) =
(|S|at step k (TI1 =I[v>] + TI1 =I[v>] + TS 0 =S 0 {I1 ,I2 } )
1kn1

+TX
S 0 = + TS=S 0 )
=
2k ((n) + (n) + (n)) + (1) + (1)
1kn1

1
= (n) 221
+ (1)
n
= (n) (2 ) + (1) = (n 2n )

Example 3.3.2 illustrates how the application of syntactic sugars allows


one to ignore constants or terms with inferior asymptotic growth, in determining the growth of a function.

3.4

Recurrences

Having determined Tfor , let us return to Algorithm 2, and determine its


asymptotic growth. Since Algorithm 2 employs M Cnaive from Algorithm 1,
we turn our attention to the latter. If we follow a procedure similar to that
applied for Tfor , we obtain:

Tswitch + Treturn I(x)


T
+ TM Cnaive (1 ,I) + TM Cnaive (2 ,I) + T
TM Cnaive (m) =
switch
Tswitch + TM Cnaive (1 ,I) + Tmodule & division

if x
if 1 2
if 1

Tswitch is the time required to looking into the formula structure, and
identifying the proper case. If we assume a formula is represented as a
tree, then this time is (1), since it amounts to looking at the children
branches of the formula at hand. T denotes the time for multiplication,
and Tmodule & division that of dividing by 1 and applying the module on a

11

value. Both are (1). Next, we can replace the execution time of each
instruction, with asymptotic notations:

(1) + (1)

TM Cnaive (m) =
(1) + TM Cnaive (m1 ) + TM Cnaive (m2 ) + (1)

(1) + TM Cnaive (m 1) + (1)

if x
if 1 2 ,
if 1

Finally, we note that the third case is a particular case of the second (in
which m1 = m 1 and TMCnaive (m2 ) = (1)). By replacing the formula
structure cases with the formula size, we get:

TM Cnaive (m) =

(1)
if m = 1
TM Cnaive (m1 ) + TM Cnaive (m2 ) + (1) m1 + m2 = m

where m designates the size of the formula at hand: ||. In determining


TM Cnaive (m), an apparent obstacle consists of the recursive manner in which
the recurrence is defined. To solve it, several methods are possible:
3.4.1

The recurrence tree method

Algorithms with recursive calls often fit the Divide et Impera(DI) pattern,
i.e. they solve a problem by dividing it into other smaller problems. This
is also the case with Algorithm 1. In general, the execution time of a DI
algorithm may be described as:

(1)
if m is sufficiently small
T (m) =
D(m) + T (m1 ) + T (m2 ) + C(m) otherwise
where:
the first condition corresponds to the case when the answer to the
problem of size m can be given directly, without a division step. The
problem is atomic in size, and cannot be further split.
T (m) is the execution time of solving a problem of size m
D(m) is the time required to split the problem into subproblems of
sizes m1 and m2 , respectively
T (m1 ) and T (m2 ) are the times for solving the smaller problems
12

m1 = |1 |,
m2 = |2 |

C(m) is the cost of combining the solutions of subproblems of sizes


m1 and m2 into a solution for the problem of size m.
In our case D(m) is Tswitch , while C(m) is either T or Tmodule & division .
The recursive tree method first: (i) accounts in a descriptive way, the
values of D(m) and C(m), at an arbitrary recursive call of depth k, as well
as the number and depth of recursive calls (ii) adds them up.
In our case both D(m) and C(m) are (1), and, since at each step
m1 + m2 = m, we will have a total of m recursive calls. More precisely
TMCnaive (m0 ) =
=
=
=

[T (m01 ) + T (m02 )] |m01 +m02 =m0


[(T (m11 ) + T (m12 )) |m11 +m12 =m01 +(T (m21 ) + T (m22 )) |m21 +m22 =m02 ] |m01 +m02 =m0
...
(1) + . . . + (1) = (m0 )

Exercise 3.4.1 (Mergesort)


3.4.2

The substitution method

The substitution method relies on: (i) guessing the growth (or exact expression) of a recurrence, and (ii) proving the guess is correct, via mathematical
induction. The substitution method is more rigorous than building recurrence trees, since the latter may be prone to errors in both recurrence tree
construction, as well as in accounting of all components of the execution
time. It is often the case that recurrence trees are used as a guessing method,
while the substitution method is used to confirm the guess is correct.
Let us apply the substitution method to verify that:
Proposition 3.4.1 TMCnaive (m) = (m).
Proof: The basis case is TMCnaive (1) = (1), which is directly confirmed by
the first branch of the definition of TMCnaive . The induction hypotheses are
TMCnaive (m1 ) = (m1 ) and TMCnaive (m2 ) = (m2 ), with m1 + m2 = m. Our
aim is to prove TMCnaive (m) = (m). According to the second branch of the
definition of TMCnaive , we have TMCnaive (m) = TMCnaive (m1 ) + TMCnaive (m2 ) +
(1). Next, we use the induction hypotheses and obtain TMCnaive (m) =
(m1 + m2 ) + (1) = (m).

Remark 3.4.1 (Basis case) In applying mathematical induction in the
substitution method, the basis case m = 1 is less important. It may be the
case that our claim may not hold for m = 1. If this is so, one can attempt
to check the claim for m = 2, 3, . . . until the claim is found to hold. This
13

approach is correct since we are trying to establish the asymptotic growth


of a recurrence, that is, establish an inequality which must hold for all m
greater than some m0 , which is arbitrary, but fixed.
3.4.3

3.5

The Master method

The notations O, , o,

Consider following improvement to Algorithm 1:


Algorithm 3: MClazy (,I)
Data: PL formula and interpretation I
Result: 1 if I |= and 0 otherwise
switch do
case x
return I(x)
case 1 2
if M Clazy (1 , I) = 0 then
return 0
else
return M Clazy (2 , I)
end
case 1
return |M Clazy (1 , I) 1|
endsw
Let TMClazy be the execution time of MClazy . We raise the following
questions: (i) what is the asymptotic growth of TMClazy , and (ii) how does
it relate to that of TMCnaive ?
Intuitively, MClazy should perform better than MCnaive , since, in the
case that 1 2 and I 6|= 1 , the algorithm no longer tries to establish
whether I |= 2 . Let us examine the behaviour of MClazy for = p 0
and an interpretation I such that p =I . In this case, TMClazy (m) = (1),
irrespective of the size of 0 . Now, let = p1 p2 . . . pm , with pi =I > for
1 i m. In this case TMClazy (m) is precisely m, thus TMClazy (m) = (m).
Thus, in the general case, we cannot claim that TMClazy (m) = (m), since
there are cases, such as the one previously shown, where there is no c1 such
that c1 m TMClazy (m) for all m greater than some fixed m0 . This example
shows a distinct feature of the notation, namely that it is tight. Whenever
f (n) = (g(n)), f and g have the exact same asymptotic growth. In some
cases, such as the one illustrated by Algorithm 3, we would like to talk
about functions which grow asymptotically: (i) at most as much/strictly
14

less than or (ii) at least as much/strictly more than some given function.
We introduce notations for each of these four possibilities:
Definition 3.5.1 (O,o,, notations) Let f : R R. Then:
O(g(n)) = {f : R R |

c R+
n n0 , 0 f (n) c g(n)}
n0 N

(g(n)) = {f : R R |

c R+
n n0 , 0 c g(n) f (n)}
n0 N

o(g(n)) = {f : R R |

c R+
n n0 , 0 f (n) c g(n)}
n0 N

(g(n)) = {f : R R |

c R+
n n0 , 0 c g(n) f (n)}
n0 N

The following table illustrates the intuition behind each notation:


f (n) = O(g(n))
f (n) = (g(n))
f (n) = (g(n))
f (n) = o(g(n))
f (n) = (g(n))

f grows asymptotically at most as g


f grows asymptotically at least as g
f grows asymptotically exactly as g
f grows asymptotically strictly less than g
f grows asymptotically strictly more than g

Proposition 3.5.1 Let f (n) and g(n) be non-negative functions. Then


f (n)
f (n)
f (n) = o(g(n)) iff lim
= 0. Similarly, f (n) = (g(n)) iff lim
=
n g(n)
n g(n)
.
Proposition 3.5.2 O(g(n)) (g(n)) = (g(n)).
Proposition 3.5.3 o(g(n)) (g(n)) = .
The proofs of the above propositions are left as exercises. Returning
to Algorithm 3, when evaluating TMClazy , one must take into account the
worst-case scenario (as stated by the previously mentioned point 3). Thus,
we have that TMClazy (m) = O(m). The execution time of M Clazy is at most
linear w.r.t. the formula size. In some situations it can be less than linear
(e.g. constant).
15

Remark 3.5.1 (Syntactic sugars for all notations) The syntactic sugars, previously defined for the notation, naturally extend to all other
notations. However, one should observe that equality is not a symmetric
relation. For instance nO(1) = O(en ), however O(en ) 6= nO(1) . Assume
O(en ) = nO(1) . By the definition of the syntactic sugars, we have that for all
f (n) O(en ) the exists g(n) O(1) such that f (n) = ng(n) . Let f (n) = en .
Then, from en = ng(n) , it follows that g(n) = logn n , but also g(n) O(1).
Contradiction.

3.6

Conclusion

We have established time (& space) as suitable metrics for evaluating and
comparing algorithms, and we have introduced asymptotic notations for expressing their growth instead of their actual expressions, which are tedious
to derive and not completely useful. We have established syntactic sugars
which offer a convenient way to handle notations. Also, we have described
three ways for determining the growth of execution times defined as recurrences. However, we have yet to compute the growth of the execution
time for Algorithm 2, that is SATnaive . TSATnaive can be naturally determined from Tfor (n), the execution time of the first part of the algorithm,
and TMCnaive (m), since M Cnaive is called in the second part of the algorithm. First note that the variable n refers to the number of variables from
the formula , while m refers to the formula size ||. Therefore, we define
TSATnaive as TSATnaive : N N N:
TSATnaive (n, m) = Tfor (n) + |S| TMCnaive (m)
= (2n ) + 2n (m)
= (2n m)
Thus TSATnaive is exponential in the number of variables of and linear in the
size of . There is also a link between the number of variables of a formula
m, and the formula size n, namely m is at most equal to n (there cannot
be more variables than the size of the formula), hence m = O(n). With
this observation, we can redefine TSATnaive in terms of n only: TSATnaive (n) =
(2n ) (1) m = (2n ) O(n) = O(n 2n ). However, we note that our
previous formulation is more precise.
Let us talk about the consumed space of each of our algorithms. In our
account, we will consider variables and data-structures used by the algorithm
only, and ignore the space occupied by the input data. We assume the space
consumed to store and I is || and |Vars()|, respectively. Thus, the

16

space consumed by M Cnaive is SM Cnaive (n, m) = (1), since no additional


data-structures are used. Similarly,
SSATnaive (n, m) = S|S| SI SM Cnaive (n, m)
= 2m m (1) = (m 2m )
where n = ||, m = |Vars()|, S|S| is the number of interpretations in the
set S and SI is the space used for storing an interpretation S.
Thus SATnaive consumes exponential space. The algorithm can easily
be improved. We assume the set Vars() is ordered, and that Vars()[i]
designates the ith element from Vars().
Algorithm 4: SATcompact ()
Data: PL formula
Result: 1 if is satisfiable and 0 otherwise
Let I : Vars() {>, } be an interpretation such that I(v) = for
all v Vars;
i = 0;
while i 6= |Vars()| do
i = 0;
while I(Vars()[i]) = > and i 6= |Vars()| do
I(Vars()[i]) = ;
i = i + 1;
end
if i 6= |Vars()| then
I(Vars()[i]) = >;
end
if MCnaive (, I)=1 then
return 1
end
end
return 0
Algorithm 4 builds interpretations incrementally, and subsequently verifies if they make the formula true. The incremental process of building
interpretations treats the latter as a binary counter, where each variable designates a bit. The incrementing stops after overflow. The space consumed
by SATcompact is (|Vars()|), thus linear with respect to the number of
variables in .

17

Abstract Data Types

4.1

Foreword

Developing programs that are correct and easy to maintain, continues to


be a challenging task. According to [1]: It is estimated that 80% of the
total time and money currently invested in software development is spent on
finding errors and amending incorrect or poorly designed software. Hence
there is an obvious need for better programming methodologies.
Formal methods provide grounds for the development of such methodologies. In short, the main idea is to start from (*) an abstract (mathematical)
specification of a problem, and iteratively refine the specification until the
actual code is obtained. The advantage of (*) is that it is non-ambiguous,
natural to describe the problem at hand, and allows proving the correctness
of the implementation derived from it. It is worth noting that there are
programming languages (Haskell for instance) which allow developing code
in a manner which is closer to specification(what the solution should look
like) than to an operational description(what the program should do).
In this setting, (i) Abstract Data Types (ADTs) are used to describe datastructures and (ii) logic is used for specifying properties of (i) as well as for
proving correctness. In what follows, we shall formally introduce Abstract
Data Types and try to compare them with similar concepts from Object
Oriented Programming (OOP). We shall focus less on the mathematical
foundations of (First-Order) logic and use it as is. Finally, we shall discuss
structural induction, a means for proving properties of ADTs.

4.2

Signatures and Algebras

This section follows the exposition from [1].


Definition 4.2.1 (Signature) A signature is a pair S = (S S , OS ), where:
S is a non-empty set. The elements s S are called sorts.
O is a set whose elements are called operation symbols, and have the
following form: f : s1 . . . sn s, where s1 , . . . , sn , s are sorts and
n N. We call s1 , . . . , sn the argument sorts, s the target sort and
s1 . . . sn s the arity of f , respectively. Whenever n = 0, an
operation symbol c : s is called a constant symbol of sort s. For
constants, we shall prefer the more compact notation c : s.
Whenever the signature is clear from context, we omit superscripts and write
S, O instead of S S , OS .
18

Example 4.2.1 (Signature) Consider the signature LB = (S LB , OLB ), where:


S LB = {BoolList, Bool};
OLB = {VoidB : BoolList, True : Bool, False : Bool, insert : Bool
BoolList BoolList, remove : BoolList BoolList}
We also use the following notation for signatures:
Signature
Sorts
Constant symbols
Operation symbols

LB
BoolList, Bool
VoidB : BoolList, True : Bool, False : Bool
insert : Bool BoolList BoolList
remove : BoolList BoolList

A signature is quite similar to an OOP interface. One should note that


sorts are merely symbols, thus statements such as True Bool are incorrect. The latter member of the relation is not a set, but a symbol (which
will be interpreted by a set, later on). The same holds for operation symbols,
which are not functions per se, but symbols denoting functions of a given
arity. As a comparison, consider calling a pure virtual function (in C++)
or abstract function (in Java). In the absence of a function definition, such
a call does not make sense.
Definition 4.2.2 (Algebra) An algebra A for a signature S = (S, O) (or
S-algebra), consists of the following:
for each sort s S, a non-empty set As , called the carrier set of the
sort s.
for each constant symbol c : s O, an element cA As .
for each operation symbol f : s1 . . . sn s in O, a function
f A : As1 . . . Asn As
An algebra is an interpretation of a signature, in the same sense as interpretations from Propositional Logic. In the latter case, each symbol was
interpreted by a truth value. Here, each sort is interpreted by a set, each
constant symbol by a constant from the appropriate set, and each operation
symbol by a function (or operation).

19

Example 4.2.2 (Algebra 1) Consider the signature LB from Example 4.2.1.


The algebra Apair for LB is defined as follows (again, we use the more compact table definition):
Algebra
Carrier sets
Constants
Operations

Apair
ABoolList , B
[]B , >,
ins : B ABoolList ABoolList
rm : ABoolList ABoolList

where the standard set B is the carrier of Bool, >, B interpret the
symbols True and False, respectively, while []B interprets the empty list constant symbol VoidB . The functions ins(b, l) = (b, l) and
 0
l
if l = (b,l)
rm(l) =
[]B otherwise
interpret the operation symbols insert and remove. Finally, ABoolList which
is the set containing []B as well as all pairs whose first element is a B and
second element is a ABoolList , is the carrier set of BoolList.
Example 4.2.3 (Algebra 2) The algebra Adec for LB is defined as follows:
Algebra
Carrier sets
Constants
Operations

Adec
N, B
1, >,
f :BNN
g:NN

where 1 N interprets the empty list VoidB , and



2n
if b =
f (b, n) =
2n + 1 if b = >
while g(n) = bn/2c.
Example 4.2.4 (Algebra 3) The algebra Aset for LB is defined as follows:
Algebra
Carrier sets
Constants
Operations

Aset
2B , B
, >,
i : B 2B 2B
r : 2B 2B

where interprets the empty list VoidB , i(b, s) = {b} s, r(s) = s.


20

The definition of Apair is quite similar to the one found in programming


languages such as Scheme or Haskell (in the former, lists are actually stored
as such pairs). Adec interprets a list of booleans as a finite sequence of bits,
where the insert operation adds the bit value to the right of the sequence (the
least significative bit). In turn, such a sequence, converted to the decimal
notation, is a natural. Finally, Aset represents lists as sets. One would easily
notice that Aset , although a proper algebra of LB , may not have the desired
behaviour. For instance, removal does not work the expected way, since it
leaves the list unchanged.
Intuitively, an algebra can be compared to a OOP (Object Oriented
Programming) -class. It interprets a signature much in the way classes
implement interfaces. As previously shown in the previous three examples,
there is much freedom in building up an algebra (as is in implementing
an interface), and such freedom is not necessarily good. We would like
to establish a means for restricting bad algebras such as Aset , from the
universe of all possible algebras associated to a given signature.
Definition 4.2.3 (Homomorphism, isomorphism) Let S = (S, O) be
a signature and A, B be two S-algebras. A homomorphism : A B from
A to B is a family = (s )sS of functions s : As Bs , such that:
s (cA ) = cB for each constant symbol c : s in O.
s (f A (a1 , . . . , an )) = f B (s1 (a1 ), . . . , sn (an )), for each operation symbol f : s1 . . . sn s and all (a1 , . . . , an ) As1 . . . Asn . That
is: s f A = f B (s1 , . . . , sn ).
is called isomorphism if for all s S, s is bijective. We write A ' B
if there exists an isomorphism from A to B. Whenever this is the case, we
say that A and B are isomorphic.
Example 4.2.5 (Isomorphism) Recall the LB -algebras Apair and Adec from
Examples 4.2.2 and 4.2.3. They are isomorphic. Consider the isomorphism
= (BoolList , Bool ), defined as follows:
P
BoolList : ABoolList N, BoolList ((bn , (bn1 , (. . . , (b1 , []B ) . . .)) = ni=0 bi 2i
Bool : B B, Bool (b) = b.
Exercise 4.2.1 Check that from Example 4.2.5 is indeed an isomorphism.

21

4.3

Abstract Data Types

Definition 4.3.1 (Abstract Data Type) An Abstract Data Type(ADT)


for a signature S is a non-empty class C of S-algebras which is closed under
isomorphism, that is:
if A C and A ' B then B C
An ADT is called monomorphic if all its elements are isomorphic, that is:
if A C and B C then A ' B
Otherwise C is called polymorphic.
Remark 4.3.1 (Utility of ADTs) Consider the Abstract Data Type CLB
of all LB -algebras. Clearly, Apair , Adec , Aset CLB . Also, CLB is polymorphic, since, for example, Aset is not isomorphic with Adec . Thus, the ADT
C1 = {Apair , Adec , Aset } is also polymorphic, by the same reason. However,
the ADT C2 = {Apair , Adec } is monomorphic. Intuitively, this means that
pairs and natural numbers, together with the operations defined, stand for
the very same ontological concept, and we can use either one, or the other,
with the same effect. The only differences between Adec an Apair are in the
effective shape of their elements, but not in behaviour. This is one of the
important features of ADTs. They allow us to abstract over how objects are
represented, and to focus on behaviour only.
Definition 4.3.1 tells us what ADTs are, but it doesnt really help us
define the ADTs we want. The set of all ADTs for a signature S is obviously
infinite, and specifying an ADT by as an enumeration of S-algebrae, is quite
unfeasible. For instance, we know, intuitively, that we prefer C2 over C1 , but
we need a way to formally specify that.
Definition 4.3.2 (Specification) A specification is a pair S = (S, )
where S is a signature and is a set of axioms. A S-algebra A is a
model of S, if all axioms from are true in A. The generated class of S is
the class C of all models of S. Then, C is a Abstract Data Type. We also
call C the Abstract Data Type generated by S. In what follows, when the
distinction is not necessary, we shall use the term Abstract Data Type to
refer to both a specification S as well as the class of models generated by it,
C.

22

Example 4.3.1 (Specification) Consider the following specification SBoolList


which adds axioms to LB :
Specification
Sorts
Constant symbols
Operation symbols
Axioms

SBoolList
BoolList, Bool
VoidB : BoolList, True : Bool, False : Bool
insert : Bool BoolList BoolList
remove : BoolList BoolList
remove(insert(b, l)) = l

Remark 4.3.2 (Axioms) We have omitted the formal definitions of axioms, as well as axioms being true in an algebra. In Example 4.3.1, the concept of axiom should be straightforward. In the same example, note that we
have not mentioned who b and l are. Intuitively, they are universally quantified variables, over the sets that interpret Bool and BoolList, respectively. To
make the exposition more compact, we have omitted a more lengthy discussion regarding variables. The reader should note that our usage of axioms
is intuitive, and not entirely rigorous.
The axiom we introduced in Example 4.3.1 clearly rules out Aset from
the ADT. The question is, whether the ADT SBoolList , which contains Apair
and Adec , is indeed monomorphic. The name assigned to the second sort in
SLB , that is Bool, suggests that its carrier set must be the set of boolean
values. However, there is no axiom to enforce this. Consider the following
algebra:
Algebra
Carrier sets
Constants
Operations

A0pair
ABoolList , N
[]B , 1, 0
ins : N ABoolList ABoolList
rm : ABoolList ABoolList

where ins and rm have the same body definitions as in Example 4.2.3,
N is the carrier set of Bool, and the natural numbers 1 and 0 interpret the
constant symbols True and False. A0pair clearly embodies lists of natural
numbers, and is a model of SLB . Also, A0pair and Adec are not isomorphic
(since N and B are not isomorphic either). Thus, we require an axiom which
specifies that the carrier set of Bool can only contain two elements, namely
those that interpret True and False. The axiom is:
x : Bool(x = True x = False)
23

The previous axiom does not rule out the possibility that Bool may be
interpreted by a one-element set. To fix this, we add:
x : Bool.y : Bool(x 6= y)
The specification, as previously defined, is the natural way to define
ADTs, and thus to rule out those algebras which may be undesirable. Seen
as such, an ADT may be compared to OOP abstract classes. For instance,
SBoolList may be interpreted as a class where the insert method is not yet
defined, however remove is, based on the former. This very insight should
suggest that, we can very well program with ADTs just in the same way we
program in OOP (as is the case in Haskell). However, formally, an ADT can
be more expressive. Under our definition, it can, for example, specify that
f (x) < g(x), where f and g are operation symbols, and < is an alreadydefined order (2-ary operation symbol) over the appropriate sorts.
Remark 4.3.3 (Constructing specifications) Developing software modules (collections of interfaces, (abstract) classes, etc.) is so much similar to
developing appropriate ADTs. However, there are subtle but essential differences. If we return to our example regarding boolean lists, and we wish to
develop an abstract class for such an object, how do we appreciate the outcome we produce, i.e. the abstract class? We can neither claim it is good,
nor that it is bad.3
Abstract Data Types rid us of such ambiguity. In our example, we know
that SLB must be monomorphic. Such a claim can be verified using mathematical tools.

4.4

Comments on developing Abstract Data Types

Note that, not all polymorphic ADTs are bad. Consider the ADT generated by the following specification:
3

In commenting a research article, Wolfgang Pauli says: This paper is so bad it is not
even wrong. The allusion is to the ambiguity of the claims made in the paper. They are
formulated in such a way they cannot be proved nor disproved.

24

Specification
Import
Sorts
Constant symbols
Operation symbols

SL
Bool, Nat
List, Element, Bool, Nat
Void : List
insert : Element List List
remove : List List
fst : List Element
isEmpty : List Bool
size : List Nat
append : List List List

Axioms

(R)
(F )
(I1)
(I2)
(S1)
(S2)
(A1)
(A2)

remove(insert(e, l)) = l
fst(insert(e, l)) = e
isEmpty(Void) = True
isEmpty(insert(e, l)) = False
size(Void) = 0
size(insert(e, l)) = 1 + size(l)
append(Void, l0 ) = l0
append(insert(e, l), l0 ) = insert(e, append(l, l0 ))

Remark 4.4.1 (Good polymorphism) First of all, let us discuss the


signature of SL . We easily notice that the symbol Bool was replaced by
Element. The intention, which will be materialised in the axioms, is to
define an ADT for lists with elements from arbitrary sets. Thus, algebrae
for lists of naturals as well as ones for lists of booleans (or even lists of
lists of booleans) will be models of SL . SL is naturally polymorphic. The
polymorphism of SL is less random than that of SLB . It is actually called
parametric polymorphism, since the different kinds of algebrae which are
models of SLB depend only on the carrier set of Element, that is on the types
of elements contained in our list. Here, by different kinds of algebrae, one
should formally understand different classes of isomorphic algebrae This
type of polymorphism is found in most modern programming languages.
Remark 4.4.2 (Reusing ADTs) Having a first look at the axioms, one
notices that there are no axioms regarding the sorts Bool and Nat, which may
lead to problems, as previously emphasized. In what follows, we assume both
Bool and Nat are Abstract Data Types, which are completely specified. By
the same reason, we are allowed to use the (otherwise undefined) operation
symbol +.
25

Remark 4.4.3 (Selecting axioms) At first sight, the axioms of SL seem


common-sense. To the program developer, they look like recursive function
definitions. Most axioms which we shall define further on, will have the same
flavour. However, the process of selecting the right axioms is more than
a programming exercise. It has its roots in the concept of axiomatization.
Assume one attempts to build a theory T , where certain properties P1 . . . Pn
hold. Instead of enumerating all these properties, and coercing T to satisfy
them by definition, one may define T as a set A1 , . . . , AK of axioms. They
are apriori truths that entail P1 . . . Pn . Also, they are as few as possible, in
any case, fewer than n. Thus, T is completely defined by the axioms. In our
situation, instead of explicitly saying that: (i) remove applied on a list of
size x always a list of size x 1, (ii) The size of a list which isEmpty is zero,
(iii) appending two lists of sizes x and y produces a list of size x + y, (iv)
appending is transitive, and so on, we simply provide the axioms. We can
later use the axioms to prove this properties. In the end, it is the properties
which we want satisfied, that drive us into writing axioms in a way or the
other.
When writing an ADT, the natural process is to first consider the properties which we want to have, and then figure out the axioms. In this chapter,
to make the exposition more natural, we proceeded in the inverse way. We
have first defined an ADT, and will subsequently prove certain properties of
it.

4.5

Representing ADT values. Constructors

The ADT terminology, as presented so far, introduces sorts, constant symbols and operation symbols which are used to make abstract references to:
(carrier) sets, designated elements of such sets, and functions (or operations),
respectively. We have introduced variables (of certain sorts) in an ad-hoc
way, to pave the way for defining axioms. The latter have also been introduced informally. The reader may easily notice that the formal apparatus
for defining (and interpreting) axioms is First-Order Logic(FOL).
In what follows, we are interested in talking about arbitrary abstract
elements of the ADT, ones which would be interpreted by the algebra as
values (elements of some carrier set), but not just those which interpret
constants. For example, we use the constant symbol VoidB : LB to refer (in
an abstract way) to the empty list, that is, the symbol which may be
interpreted as 1 N, in the algebra Adec , as already seen. How do refer, in
the same abstract way, to (i) the list which contains true or (ii)the list
which contains false and true, in this particular order, etc. ?
26

The natural way to do this, is by designating the construction process


of the (abstract) value itself. In the case of (i), this is insert(True, VoidB ),
and in that of (ii), insert(False, insert(True, VoidB )). These are called terms,
according to the FOL terminology. Again, we shall omit a formal definition.
Therefore, we use term to refer to an abstract representation of a certain
value. This representation has an important feature: it captures the history
of the value itself, that is, the entire process of building it from the constants.
One can easily notice that multiple terms can represent the same value.
For instance insert(True, VoidB ) and remove(insert(False, insert(True, VoidB )))
should be interpreted by the same value (if the ADT SLB is monomorphic).
How then, do we designate in a unique way, the list which contains true ?
The classification below solves this issue.
Definition 4.5.1 ((Base, internal, external, nullary) constructors)
Let s be a fixed sort from some signature S, f : s1 . . . sn s0 be an
operation symbol, and c : s00 be a constant symbol.
f is called a constructor of sort s iff s = s0 . That is, any function that
interprets f will produce values from As .
f is called an internal constructor of sort s iff (i) it is a constructor
of sort s and (ii) there exists 1 i n such that si = s. That is, any
function that interprets f uses a value from As to build another value
from As .
f is called an external constructor of sort s iff (i) it is a constructor
of sort s and (ii) for all 1 i n, si 6= s. That is, any function
that interprets f uses values different from those of As to build a value
from As .
c is called an nullary constructor of sort s iff s = s00 .
The constructors f1 , . . . , fk of sort s are called base constructors of sort s
iff for all S-algebrae A, and all values v As , there exists a unique term,
consisting of operation symbols from f1 , . . . , fk only (and of any constant
symbols), that interprets v.
Example 4.5.1 (Constructors) Recall the ADT SLB . VoidB is a nullary
constructor of sort BoolList. insert and remove are internal constructors of
sort BoolList. insert is a (unique) base constructor of sort BoolList. By iterative applications of insert, any (abstractly-represented) lists can be obtained.

27

Remark 4.5.1 (Exceptions) There are situations when building up certain terms (and subsequently applying certain operations in a given algebra)
do not make sense. For example, in the algebra Adec , by what number should
the term remove(Void) to be interpreted ? The same question can be reformulated as: what is the list represented by g(1) = b1/2c = 0 ?
This issue can be settled either by: (i) introducing special exception values, and treating them much in the way they are treated in OOP, or (ii)
develop a theory of working with only partially-defined functions (hence algebras). Throughout this chapter, we will opt for the first alternative. Thus,
SL becomes:
SL
Bool, Nat
List, Element, Bool, Nat
Void : List, ErrorL : List, ErrorB : Bool, ErrorE : Element,
ErrorN : Nat
Operation symbols insert : Element List List
remove : List List
fst : List Element
isEmpty : List Bool
size : List Nat
append : List List List
Specification
Import
Sorts
Constant symbols

Axioms

(R1)
(R2)
(R3)
(F 1)
(F 2)
(F 3)
(I1)
(I2)
(I3)
(S1)
(S2)
(S3)
(A1)
(A2)
(A3)

remove(insert(e, l)) = l
remove(Void) = ErrorL
remove(ErrorL ) = ErrorL
fst(insert(e, l)) = e
fst(Void) = ErrorN
fst(ErrorL ) = ErrorN
isEmpty(Void) = True
isEmpty(insert(e, l)) = False
isEmpty(ErrorL ) = ErrorB
size(Void) = 0
size(insert(e, l)) = 1 + size(l)
size(ErrorL ) = ErrorB
append(Void, l0 ) = l0
append(insert(e, l), l0 ) = insert(e, append(l, l0 ))
append(ErrorL , l) = append(l, ErrorL ) = ErrorL

The ADT SL generalises SLB in that lists can contain arbitrary elements
28

4.6

The ADT of Propositional Logic. A case study

In this section, we shall define an ADT that encompasses Propositional


Logic. We will use this as an example on how the definition of an ADT
should be approached. Here are a few guidelines we should consider:
1. Should our ADT be polymorphic or monomorphic ?
2. How should the constructors be chosen, such that they are in agreement with our definition ?
3. What are other operators which we may be interested in?
4. What are the properties which we want true, and which should guide
us in establishing the axioms ?
In what follows, we build our ADT by addressing each of the above questions,
as exercises:
Exercise 4.6.1 Should our ADT be polymorphic or monomorphic ?
.
Exercise 4.6.2 Define the base constructors for our ADT.
Exercise 4.6.3 Define the operations for our ADT.
.
In what follows, we shall write I |= (6|=) instead of I |= = True(False)
to make the text more legible. Consider the following propositions:
Proposition 4.6.1 .size() varno().
.
Proposition 4.6.2 (de Morgans laws) 1 2 .I |= 1 2 iff I |=
(1 2 ).
.
Proposition 4.6.3 (Tautology) 1 .2 if I |= 1 2 iff I |= 1 then
2 is valid.
.

29

Proposition 4.6.4 (Replacement theorem) Two formulae 1 and 2


are called semantically equivalent whenever I |= 1 iff I |= 2 , for all
interpretations I. We write [2 \ 1 ] to refer to the formula where all
occurrences of 1 have been replaced by 2 .
If two formulae 1 and 2 are semantically equivalent, then, for all ,
[2 \ 1 ] and are also semantically equivalent.
.

4.7

Proving properties of ADTs. Structural induction

Structural induction is the fundamental instrument for proving ADT properties. Actually, structural induction is commonly used for proving statements
regarding any abstract mathematical object which is recursively defined (as
is the case with ADTs). Each term of an ADT is built starting from another term, using a constructor. For instance, the term insert(True, VoidB ),
that is, the list containing the element True, is built from the term VoidB ,
using the constructor insert. Let T be the set of all terms built using the
base constructors of an ADT S and P : T {0, 1} be a property of elements of T . Structural induction is used to prove statements of the form
t T.P (t) = 1. To understand structural induction, it is useful to consider
it as a generalization of mathematical induction. The latter is used to prove
statements of the form n N.P (n) = 1, where P is a property of natural
numbers. The mathematical induction schema is as follows:
P (0) = 1

P (n)=1
P (n+1)=1 (n

N)

n N.P (n) = 1
A
is interpreted
B
as A = B, thus if A is true, then B is true. It can be read as: if (*)
P is true for the value 0 (P (0) = 1) and (**) if we assume P is true for
value n, then we can show P is true for the value n + 1, for all values n N
P (n) = 1
(
(n N)), then P is true for all values n N. (*) is called
P (n + 1) = 1
the basis case. (**) is called the induction step, and the assumption
P is true for value n from (**) is called induction hypothesis.
We can imagine the natural numbers as values associated to terms which
are built by the constructors of an ADT SN , given by (the incomplete specThe schema relies on the notation convention by which

30

ification):
Specification
Sorts
Constant symbols
Operation symbols
Axioms

SN
Naturals
Zero : Naturals
Succ : Naturals Naturals
...

Thus, the (unique) basis case of the mathematical induction, which addressed the value n = 0, will generalize in the structural induction to a set of
basis cases, one for each: (i) constant symbol and (ii) external constructor.
Therefore, the basis case will address those terms which are not built using
other terms. The (unique) induction step of the mathematical induction,
which addressed the values n + 1 constructed from n by applying Succ, will
generalize in the structural induction to a set of induction steps, one for
each internal constructor.
Let 0 , e , i designate the set of nullary, external and internal constructors of an ADT. The structural induction schema is given by:
P (0 ) = 1.0 0 P (e (. . .))e e

P (t)=1
P (i (...,t,...))=1 (t

T, i i )

t T.P (t) = 1

5
5.1

Complexity Theory
The Turing Machine

The Turing Machine was developed in the 1930s by Allan Turing. The purpose of the Turing Machine was to define mechanical computation, in an
age where computers and the related technology were still in their infancy.
Seen from our time, we can imagine the Turing Machine as a programming
language for pen and paper, stripped to its very basics. But how can such
a language still be useful after more than 80 years, when a great variety
of powerful languages are available? The two key features of the Turing
Machine are (i) simplicity: The Turing Machine has a unique type of instruction: one that reads a character in a given state, and depending on the
read character transitions in another state, possibly modifying the current
character and moving to the next one. This is somewhat similar to gotos.
There are no fors, no explicit ifs, not even functions; we had a similar
approach related to the Propositional Logic, where we have defined a syntax and semantics for the operators and only, for the economy of the
31

definition. All other operators can be defined in terms of the former; (ii)
expressiveness: as simple as it is, the Turing Machine can express any form
of computation, no matter how complex. Thus, any C/C++, Java, etc. program can be expressed as a Turing Machine. Naturally, the corresponding
Turing Machine would be quite complicated and difficult to develop from
scratch. However, the purpose of the Turing Machine is not to be used as a
programming language. Rather, it can be seen as an abstract specification
of an algorithm (analogy which will be used throughout the course notes),
which is further on used to examine how much resources are consumed when
solving a problem.
In what follows, we switch to a more formal definition of these intuitive
concepts.
Definition 5.1.1 (Problem instance) A problem instance is a mathematical object of which we ask a question and expect an answer.
Example 5.1.1 (Problem instance) Consider the (directed) graph G =
(V, E) where V = {a, b, c, d} and E = {(a, b), (b, c), (c, a), (d, a)}. The question we would like to ask is There exists a loop in G ? The graph G of which
we ask the former question is a problem instance.
Definition 5.1.2 ((Decision) problem) A problem is a mapping P : I
O where I is a set of problem instances of which we ask the same question,
and O is a set of answers. P assigns to each problem instance i I the
answer P (i) O. If O = {0, 1} we say that P is a decision problem i.e.
one whose answer is yes/no (or 0/1).
In the rest of the chapter, we shall discuss decision problems only.
Remark 5.1.1 (Problem vs. algorithm) An algorithm is a specification
of a finite computation process, which takes an encoding of a problem instance and produces an output. If the problem is a decision problem, then
the output is 0/1.
The above statement is an intuitive description of an algorithm. In what
follows, we give a more formal definition. An algorithm is a Turing Machine.
Definition 5.1.3 ((Deterministic) Turing Machine) A (Deterministic)
Turing Machine (abbreviated (D)TM) is a tuple M = (K, F, , , s0 ) where:
= {a, b, c, . . .} is a finite set of symbols which we call alphabet;
32

K is a set of states, and F K is a set of accepting/final states;


: K K {L, H, R} is a transition function which assigns
to each state s K and c the triple (s, c) = (s0 , c0 , pos).
s0 K is an initial state.
The Turing Machine has a tape which contains infinite cells in both directions, and on each tape cell we have a symbol from . The Turing Machine
has a tape head, which is able to read the symbol from the current cell. Also,
the Turing Machine is always in a given state. Initially (before the machine
has started) the state is s0 . From a given state s, the Turing Machine reads
the symbol c from the current cell, and performs a transition. The transition
is given by (s, c) = (s0 , c0 , pos). Performing the transition means that the
TM moves from state s to s0 , overrides the symbol c with c0 on the tape cell
and: (i) if pos = L moves the tape head on the next cell to the left, (ii) if
pos = R moves the tape head on the next cell to the right and (iii) pos = H
leaves tape head on the current cell.
The Turing Machine will perform transitions according to .
Whenever the TM reaches an accepting/final state, we say it halts. If
a TM reaches a non-accepting state where no other transitions are possible,
we say it clings/hangs.
Example 5.1.2 (Turing Machine) Consider the alphabet = {#, >, 0, 1},
the set of states K = {s0 , s1 , s2 }, the set of final states F = {s2 } and the
transition function:
(s0 , 0) = (s0 , 0, R) (s0 , 1) = (s0 , 1, R)
(s0 , #) = (s1 , #, L) (s1 , 1) = (s1 , 0, L)
(s1 , 0) = (s2 , 1, H) (s1 , >) = (s2 , 1, H)
The Turing Machine M = (K, F, , , s0 ) reads a number encoded in binary
on the tape, and increments it by 1. The symbol # encodes the empty cell
tape.4 Initially, the tape head is positioned at the most significant bit of the
number. The Machine first goes over all bits, from left to right. When the
first empty cell is detected, the machine goes into state s1 , and starts flipping
1s to 0s, until the first 0 (or the initial position, marked by >) is detected.
Finally, the machine places 1 on this current cell, and enters its final state.
The behaviour of the transition function can be more intuitively represented as in Figure 5.1.2. Each node represents a state, and each edge, a
4

We shall use # to refer to the empty cell, throughout the text

33

c/c, R with c {0, 1}

start

s0

1/0, L

#/#, L

s1

0/1, H
> /1, H

s2

Figure 5.1.2: The binary increment Turing Machine


transition. The label on each edge is of the form c/c0 , pos where c is the
symbol read from the current tape cell, c0 is the symbol written on the current tape cell and pos is a tape head position. The label should be read as:
the machine replaces c with c0 on the current cell tape and moves in the
direction indicated by pos.
Let us consider that, initially, on the tape we have >0111 the representation of the number 7. The evolution of the tape is shown below. Each
line shows the TM configuration at step i, that is, the tape and current state,
after transition i. For convenience, we have chosen to show two empty cells
in each direction, only. Also, the underline indicates the position of the tape
head.
Transition no
Tape
Current state
0
##>0111##
s0
1
##>0111##
s0
2
##>0111##
s0
s0
3
##>0111##
4
##>0111##
s0
s1
5
##>0111##
6
##>0110##
s1
7
##>0100##
s1
8
##>0000##
s1
9
##>1000##
s2
Example 5.1.2 shows how a Turing Machine performs a simple increment.
In what follows, we examine how Turing Machines can be used to solve
decision problems.
Definition 5.1.4 (Word) Let be an alphabet which contains at least one
symbol different from the empty cell. A word over (or simply word) is a
sequence w = c1 c2 . . . cn . . . of symbols from . We denote by the set of
all finite sequences formed with symbols from . Also, we write |w| to refer
to the number of symbols or size of w.
34

Any problem instance can be encoded by a word over some alphabet. As


seen in Example 5.1.2, an integer can be encoded by its binary representation. The nodes of a graph G = (V, E) can be represented as a sequence of
integers from 1 to |V |, separated by a special character, or a single empty
cell symbol. The representation of the edges follows that of the nodes, and
consists of pairs of nodes (integers), again separated by empty cell symbols.
A PL formula can be seen as a particular type of graph a tree and
represented similarly.
More generally, if i is a problem instance, we write enc(i) to refer to
the word which encodes i according to a given scheme, under an alphabet
specified in advance. Usually, we omit mentioning the scheme at hand, since,
in most cases, it is irrelevant for our exposition.
Definition 5.1.5 (TM input, output) Let P : I {0, 1} be a decision
problem and M be a Turing Machine.
the input of a Turing Machine is a finite word which is contained in
its otherwise empty tape;
the output of a TM is the contents of the tape (not including empty
cells) after the Machine has halted. We also write M (w) to refer to
the output of M , given input w.
We say a TM M solves a decision problem P : I {0, 1}, if M (enc(i)) =
1 whenever P (i) = 1 and M (enc(i)) = 0 whenever P (i) = 0, for all i I.
However, we shall later see that solvability can also be defined in another,
more general way.
Definition 5.1.6 (Running time) Let T : N N and be an alphabet.
We say a Turing Machine M runs in T time (or the running time of M is
T ) if M executes at most T (|w|) transitions for each w .
5.1.1

What encoding to choose ?

Naturally, problem instances can be encoded using different alphabets. For


instance, the Turing Machine from Example 5.1.2 could have used the alphabet 0 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0, #, >} instead of the binary one. Of course,
the behaviour of the machine would have been different. Thus, instead of
using n tape cells to encode an integer of value at most 2n , one would use
dlog ne cells. So is the alphabet 0 better than from Example 5.1.2 ?
To answer this question, we must first recall that, when determining
the execution time of an algorithm, we did not make a distinction between
35

different functions which had the same asymptotic growth. Thus, n2 and
n2 +5n+6 were essentially seen as indistinguishable, and denoted as (n2 )5 .
In complexity theory, we will make a step forward and abandon the
distinction between any functions which grow as fast as any polynomial.
Under this assumption, execution times such as n2 , n5 or n2 log2 n will be
seen as indistinguishable. We shall take this intuition to the formal level
later on, when we introduce complexity classes.
One may argue that this assumption is overly simplistic. In practice,
for large enough values of n, n2 is an acceptable running time, while n5 is
not. This is indeed true. However, even under this apparently rudimentary
assumption, we can achieve an insightful classification of problems. This
classification deems a considerable number of interesting problems as practically unsolvable due to excessively high running times. We shall defer
further comments for the following sections, and return to the issue of the
encoding. This is addressed by the following proposition:
Proposition 5.1.1 (The encoding does not matter) Every decision
problem Q : I {0, 1} which is solved by a Turing Machine M in time
T (n), with alphabet (which is used for encoding problem instances from
I), is also solved in time O(log ||) T (n) using the alphabet {0, 1, #, >}.
Proof:(sketch) We build M 0 = (K 0 , F 0 , 0 , 0 , s00 ), from M as follows:
0 = {0, 1, #, >}. We encode each symbol different from # (the empty
cell) and > (the marker symbol of the beginning of the input), as a
word w 0 with |w| = dlog ||e. We use k to refer to the length |w|
of the word w. We write enc0 (x), with x , to refer to the encoding
of symbol x .
For each state s K, we build 2k+1 states q1 , . . . q2k+1 K 0 , organized
as a full binary tree of height k. The purpose of the tree is to recognize
the word enc0 (x) of length k from the tape. Thus, the unique state
at the root of the tree, namely q1 , is responsible for recognising the
first bit. If it is 0, M 0 will transition to q2 and if it is 1, to q3 . q2 and
q3 must each recognize the second bit. After their transitions, we shall
be in one of the states q4 to q8 , which give us information about the
first two bits of the word. The states from level i recognize the first
i bits of the encoded symbol enc0 (x). The states from the last level
are 2k in number, and recognize the last bit of the encoded symbol
5

or equally (2 n2 ), etc.

36

enc0 (x). Thus, each of the 2k leaf-states in the tree corresponds to


one possible symbol x which is encoded as enc0 (x). We connect
all 2k+1 states by transitions, as described above.
For each transition (s, x) = (s0 , x0 , pos) of M , the machine M 0 must:
(i) recognize x, (ii) override x0 , (iii) move according to pos and go to
state s0 . Thus:
(i) is done by the procedure described at the above point.
for (ii), we use k states to go back (k cells) at the beginning of
enc0 (x) and write enc0 (x0 ), cell by cell. Finally, we connect the
state corresponding to enc0 (x) from the tree, to the first of the
above-described k states.
for (iii) if pos = L/R, we use another k states to go either left or
right. If pos = H, we need not use these states. Finally, we need
to make a transition to the root of the state tree corresponding
to s0 .
For each transition (s, x) = (s0 , x0 , pos) of M , M 0 performs k transitions
for reading the encoded symbol x, k transitions for writing x0 and possibly
k transitions for moving the tape head. Thus, for all w 0 , the number
of transitions performed by M 0 is at most 3k T (|w|). Hence, the running
time of M 0 is O(k) T (n).

The proof of Proposition 5.1.1 shows that any Turing Machine using an
arbitrary alphabet can be transformed in one using the binary alphabet.
The overhead of this transformation is logarithmic: O(dlog ||e). Thus, if
the original Turing Machine runs in some polynomial time T (n), then the
transformed TM will run in O(dlog ||e) T (n) time which is bound by a
polynomial. Similarly, if the original TM is running in supra-polynomial
time, the transformed TM will also run in supra-polynomial time.
In what follows, we shall assume all Turing Machines are using the binary
alphabet b = {0, 1, #, >}.
5.1.2

Turing Machine or Turing Machines The Universal TM

The term Turing Machine hides a certain ambiguity. Formally, it refers to a


fixed behaviour, which is encoded in states and transitions. A TM is similar
to a certain installation designed to do a particular thing. However, the
principle of the TM is the same for all particular machines: it essentially
relies on transitions from states, depending on the current state and tape
cell.
37

Thus, the principle of the Turing Machine is analogous to a programming


language: it defines the syntax and semantics of the language at hand. On
the other hand, a Turing Machine is analogous to a particular program,
which solves a certain problem, and only that. However, Turing Machines
are more powerful: they can also play the role of the programming language.
Thus, a Turing Machine can take any Turing Machine and simulate it on a
particular input. This is the Universal Turing Machine. Switching to the
formal level:
Proposition 5.1.2 (TMs as words) Any Turing Machine M = (K, F, b ,
, s0 ) can be encoded as a word over b . We write enc(M ) to refer to this
word.
Proof:(sketch) We encode each state in K\F \{s0 } as an integer in {1, 2, . . . , |K\
F \ {s0 }|} and each final state as an integer in {|K| + 1, . . . , |K| + |F |}. We
encode the initial state s0 as |K| + |F | + 1, and L, H, R as |K| + |F | + i with
i {2, 3, 4}. Each integer from the above is represented as a word using
dlog(|K| + |F | + 4)e bits.
We encode as follows: each transition (s, c) = (s0 , c0 , pos) is encoded
in binary as:
enc(s)#c#enc(s0 )#c#enc(pos)
where enc() is the encoding described above. The entire is encoded a
sequence of encoded transitions, separated by #. The encoding of a TM is
enc(M ) = enc(|K \ F \ {s0 }|)#enc(|F |)#enc()

Proposition 5.1.3 (The Universal Turing Machine) There exists a
Turing Machine denoted U which, for any TM M having running time T (n),
and every word w b , takes enc(M ) and w as input and outputs 1 whenever M (w) = 1 and 0 whenever M (w) = 0. We call U , the Universal Turing
Machine, and say that U simulates M . Moreover, the running time of U is
c T (n) where c only depends on M and not on the size of w.
Proof: Let M be the TM to be simulated and w = c1 c2 . . . cn be the input
for M . We build the Universal Turing Machine U , as follows:
The input of U is enc(M )#enc(s0 )#c1 #c2 . . . cn . Note that enc(s0 )
encodes the initial state of M while c1 is the first symbol from w.
The portion of the tape enc(s0 )#c1 #c2 . . . cn will be used to mark the
38

current configuration of M , namely the current state of M (initially


s0 ), the contents of M s tape, and M s current head position. More
generally, this portion of the tape is of the form enc(si )#u#v, with
u, v b and si being the current state of M . The last symbol of u
marks the current symbol, while v is the word which is to the left of
the head. Initially, the current symbol is the first one, namely c1 .
U will scan the initial state of M , then it will move on the initial
symbol from w, and finally will move on the portion of enc(M ) were
transitions are encoded. Once a valid transition is found, it will execute
it:
1. U will change the initial state to the current one, according to
the transition;
2. U will change the original symbol in w according to the transition;
3. U will change the current symbol, according to pos, from the
transition;
U will repeat this process until an accepting state of M is detected,
or until no transition can be performed.
Let l denote the size of enc(M ). For each transition of M , U will move
over at most l tape cells, in at most l steps, 6 times (back and forward, thus
2 times, for each of the steps 1. 2. and 3. enumerated above). Thus, for
each transition of M , U performs at most c = 6 l transitions. Hence, if
T (n) is the running time of M , the running time of U is c T (n).

The existence of the Universal Turing Machine attests for the power
of this model of computation. Such a machine can execute any program,
encoded as a Turing Machine. Again, the overhead of the simulation is
polynomial, w.r.t. the running time of M .
The existence of the Universal TM was proved by Allan Turing in the
1930s. The entire idea, quite straightforward for us now, was revolutionary
at the time. It is the origin of the concept of the universal computer, one
which takes an arbitrary program, and executes it. It took several decades
for such computers to be actually built. Their principle was quite similar
to that of the TM. The program was a sequence of instructions (transitions
in the TM), which manipulated stored data (the tape head), by updating
configuration of the machine (current state, current symbol) after each instruction.

39

5.2

Decidability Theory

This chapter investigates the limits of the Turing Machine. More precisely,
we are interested in (i) examining whether there are problems which are
not solvable by any Turing Machine and (ii) whether there exists a computational model, more powerful than the Turing Machine, which is able to
solve such problems. As it turns out, the answer to (i) is yes: some problems
can only partially be solved by a TM, and some cannot be solved at all.
The answer to (ii) is formulated as the Church-Turing conjecture (or thesis),
which states that any computational model is as expressive as the Turing
Machine. Computability theory is actually rich in computational models.
Some of the most wide-spread are: the lambda calculus, while programs,
RAM machines (very similar to assembly programs), Markov (or normal)
algorithms. All of them are shown to be equivalent with the TM. More
precisely, each lambda-expression, while or RAM program and Markov algorithm, is equivalent with a Turing Machine in the sense that they solve the
same problem. The Church-Turing thesis is formulating an even stronger
claim: any computational model (not just the existing ones) is equivalent to
the Turing Machine.
In what follows, we provide rigorous formulations for solvable and partially solvable problems, and construct a taxonomy of problems, based on
Turing Machine solvability. We provide a technique for establishing TM
solvability, and end with formulating the Church-Turing thesis.
To begin with, we refine Definition 5.1.2 of a decision problem. Recall
from Remark 5.1.4, that any problem instance can be encoded as a word
over b .
Proposition 5.2.1 b and N are isomorphic.
Proof: We prove this result for 01 = {0, 1}. 01 and N are isomorphic iff
there exists a bijective function h : 01 N. Intuitively, h assigns for each
word in 01 a unique number from N. Note that each word wn 01 is a
sequence of bits which can be seen as a binary representation of a natural
number n N. Thus, h is the transformation function from the binary
base to the decimal one. It is straightforward that our choice of h is an
isomorphism (each binary word corresponds to exactly one natural, and for
each such word there exists exactly one corresponding natural). The same
result can be extended from 01 to b , by assuming we have 4 bits instead
of two.

Proposition 5.2.1 takes us a step forward. We can represent (or interpret)
each problem instance, that is, a word from b , as a natural number. With
40

this in mind, we have:


Definition 5.2.1 (Decision problem) A decision problem is a function
f : N {0, 1}. f assigns to each problem instance n N, the yes/no
answer of f (n).
Definition 5.2.1 suggests that saying A Turing Machine solves a decision
problem f is the same thing as saying A Turing Machine computes a
function f : N {0, 1}. While seeing problems as functions may seem less
intuitive or too abstract, it is actually quite useful for analysing solvability.
The following two propositions already give us a clue on the limits of the
Turing Machine:
Proposition 5.2.2 The set Hom(N, {0, 1}) of functions f : N {0, 1} is
uncountably infinite.
Proof:

Proposition 5.2.3 The set T M of Turing Machines is countably infinite.


Proof: The set T M can be uniquely identified with {enc(M ) | M T M},
which is a subset of b and which in turn is isomorphic with N.

Propositions 5.2.2 and 5.2.3 tell us that there are infinitely more functions (decision problems) what means of computing them (Turing Machines).
Next, we also refine the notion of solvability:
Definition 5.2.2 (Decision, acceptance) Given a Turing Machine M and
a function f : N {0, 1}, we say that:
M decides f , iff M (wn ) = 1 whenever f (n) = 1 and M (wn ) = 0
whenever f (n) = 0 for all n N, where wn is the binary encoding of
n.
M accepts f iff M (wn ) = 1 iff f (n) = 1, and M does not halt iff
f (n) = 0, for all n N.
Note that, in contrast with acceptance, decision is, intuitively, a stronger
means of computing a function (i.e. solving a problem). In the latter case,
the TM at hand can provide both a yes and a no answer to any problem
instance, while in the former, the TM can only provide an answer of yes. If
the answer to the problem instance at hand is no, the TM will not halt. We
shall formalize this intuition, later on.
41

Proposition 5.2.4 Turing Machine M accepts a function f : N {0, 1},


iff there exists a Turing Machine which can enumerate/generate all elements in Af = {n N | f (n) = 1}. Intuitively, Af is the set of problem
instances of which the answer at hand is yes.
Proof: = Assume M accepts f . We specify the TM generating Af as
pseudocode, for simplicity. Defining the same TM with states/transitions is
cumbersome, takes a lot of space, and is difficult to follow.
Algorithm 5: GEN()
static Af = ;
k = 0;
while True do
for 0 i k do
run M (wi );
if M (wi ) halts before k steps and i 6 Af then
Af = Af {i};
return i
end
end
k = k + 1;
end
The value of k from the outer for has a two-fold usage. First, it is used
to explore all problem instances 0 i k. Second, it is used as a time-limit
for M . Each problem instance i is run on M , for precisely k steps. If the
answer to i is yes, and it is provided in at most k steps, then i is added to
Af , and then returned (written on the tape). Also, i is stored in a variable
which holds its contents between different executions of GEN. It may be
the case that the answer to i is yes, but it takes more than k steps. If this is
the case, then i will be detected during a future call of GEN, when M will
be simulated for more steps.
= Assume we have the Turing Machine GEN which generates Af .

42

We construct a Turing Machine M which accepts f . M works as follows:


Algorithm 6: M (wn )
Af = , n = 0;
while n 6 Af do
n = GEN ();
Af = Af {n}
end
return 1
M simply uses GEN to generate elements in Af . If the answer to wn is
yes, then, at some point n will be generated by GEN, placed into Af , and
M will output 1. If the answer to wn in no, M will not halt.

Proposition 5.2.4 provides with an insight into the meaning of accepting
a function (problem). It is the same thing as generating the set Af of yes
instances. Note that Af is not necessarily finite. But even though Af is
finite for some problems, we cannot a-priori know of what size Af is. Thus,
the generator may continue to look for yes instances. A dual view to decision
and acceptance is related to how we might know the set Af . If, for each
n N, we can establish n Af as well as n 6 Af , then f can be decided
by a TM. If we can only establish n Af , when this is the case, then f is
accepted by TM.
Proposition 5.2.5 If f : N {0, 1} is decided by some TM M , then f is
also accepted by some TM M 0 .
Proof: Let M be the TM which decides f , thus M (wn ) = f (n) for all n N.
Then M can be used to build a generator for Af . We simply run M (wn )
for each problem instance n N, until we obtain an answer of yes. By
Proposition 5.2.4, this means there exists M 0 which accepts f .

Proposition 5.2.5 formally establishes that decision is a much stronger
means of solving problems, when compared with acceptance.
We conclude with the following taxonomy:
Definition 5.2.3 (Decidability classes) Let f : N {0, 1} be a decision
problem. If f is decided by some TM M , we say f is recursive/decidable.
If f is accepted by some TM M , we say f is recursively-enumerable/semidecidable. We denote by:
R - the set of all recursive functions.
That is R = {f Hom(N, {0, 1}) | f is recursive};

43

RE - the set of all recursively-enumerable functions.


That is RE = {f Hom(N, {0, 1}) | f is recursively-enumerable};
Proposition 5.2.6 R RE.
Proof: The proof follows from Proposition 5.2.5, since every recursive function is also recursively-enumerable.

Proposition 5.2.7 There exists a function f 6 R.
Proof: Let us build such a function:

0 Mn (wn ) = 1
f (n) =
1 Mn (wn ) = 0 or Mn (wn ) loops
We show f 6 R, by contraposition. Assume f R. Then, there exists a
TM M such that M (wn ) = f (n) for all n N. We build the infinite table
with all enumerated Turing Machines on lines, input words on columns, and
the output for each combination of TM and input word:
w1 w2 w3
M1 1
0
1
M2 0
0
M3 0
1
1
... ... ... ...
Mi 0
0
... ... ... ...

. . . wk . . .
... ...
... 1 ...
... 0 ...
... ... ...
... 1 ...
... ... ...

The values 0, 1, (does not halt) which populate the table are selected
arbitrarily, for description purposes.
The Turing Machine M must be some Mi , that is, the i-th Turing Machine. Let us look at the output of Mi , given input wi (the encoding of i
in binary). Mi (wi ) = f (i) since Mi decides f . Assume Mi (wi ) = 1. Then
f (i) = 0, according to the definition of f . Thus Mi (wi ) 6= f (i). Contradiction. Assume Mi (wi ) = 0. Then f (i) = 1. Contradiction. Finally, we
also obtain a contradiction for Mi (wi ) = (Mi does not halt for wi ). Thus,
f 6 R.

Definition 5.2.4 (Halting problem) Let:

0 Mn (wn ) does not halt
fhalt (n) =
1 Mn (wn ) halts
44

fhalt is called the halting problem: it takes as input the encoding n of a word
as well as Turing Machine Mn , and establishes if Mn halts for input wn .
This problem is also related to program correctness. A program is considered
totally correct if it halts for all its inputs, and the answer is the expected
one. If we cannot establish halting, but the result is always the correct one,
we say a program is partially correct.
Proposition 5.2.8 The halting problem is recursively-enumerable (semidecidable).
Proof: The proof requires two steps. The first consists of showing fh RE,
which is done by building a generator G for Afh . G simulates the TM Mn
on input enc(n), for a fixed number of k steps. The behaviour is similar to
GEN. If Mn halts, then n is added to Afh .
The second step is showing fh 6 R. This is achieved using contraposition.
We assume fh R and denote by M h the TM which decides fh . We now
show we can use M h to build a TM which solves the problem f , from the
proof of Proposition 5.2.7. The behaviour of the TM is as follows. First, we
run M h (enc(n)). If the output is 0, then Mn (enc(n)) does not halt. Thus,
we output 0. If the output of M h (enc(n)) is 1, then we know Mn (enc(n))
halts. Thus, we need to simulate Mn (enc(n)). Henceforth, we will output
1/0 depending on the output Mn (enc(n)). We have shown that f R,
which contradicts Proposition 5.2.7.

Proposition 5.2.9 R 6= RE.
Proof: We have fh in RE but fh 6 R.

Proposition 5.2.9 gives us a strong result: that is, there are problems
which cannot be decided by the Turing Machine. However, are these problems of interest, or do they correspond to bizarre functions such as f from
the proof of Proposition 5.2.7 ?
In what follows, we state one of the strongest results of this section. This
is formulated as the Rice theorem, given below:
Theorem 5.2.1 (Rice) Let C RE, C 6= , M be a Turing Machine and
f : N {0, 1} be the decision problem accepted by f . Establishing whether
f C is in RE.
Proof:

This proposition may be hard to grasp at a first lecture. We interpret
it as follows: C is a subset of RE, thus a set of recursively-enumerable
45

functions. C can be viewed as a property of a problem, much in the way the


set {2k | k N} can be seen as the property even of natural numbers. Thus,
problems f C have the property defined by C, while problems f 6 C do
not. The Rice Theorem tells us that any non-trivial property C of a program
is semi-decidable. .
We can apply The Rice Theorem to show that a more general version of
the halting problem is not decidable. Let C = R, f be an arbitrary function
and M be the TM which accepts f . Thus, checking f C is equivalent to
checking if f is recursive, or, alternatively, if M halts for all inputs. Since
checking f C is not decidable (due Rice), then establishing whether M
halts for all inputs, is also not decidable.
Imagine C is the set of problems which are decided by Turing Machines
that behave as computer viruses. The same theorem states that we cannot
establish that M behaves as a virus. Our world is surrounded by such nontrivial properties, which we cannot decide, but only accept.
The Rice Theorem establishes a strong limit on the ability of the Turing
Machine. We connect this result with the Church-Turing thesis:
Claim 5.2.1 (Church-Turing) Any decision problem f : N {0, 1} which
is universally computable (i.e. whose values f (n) can be determined,
for any n N, by some finite sequence of computing steps), is Turingcomputable (that is, it can be decided by a Turing Machine).
This yet un-refuted claim, coupled with the Rice Theorem, offers a comprehensive picture on real-world solvability: (i) any problem that is solvable,
is solvable using a Turing Machine and (ii) a lot of interesting problems
cannot be solved by the Turing Machine, hence, if we accept the ChurchTuring thesis, cannot be solved at all. Therefore, despite our technological
advances, our means of performing (and understanding) computation is inherently limited.

5.3

Complexity classes

In the previous section, we have classified problems into two classes, namely
R and RE. The first class contains decidable problems, the second, semidecidable problems. This classification relies on solvability. In what follows,
we look at problems from R only and further refine our classification, by
looking at how much time/space is consumed in order to solve such problems.
First, we introduce the non-deterministic Turing machine.

46

5.3.1

The non-deterministic Turing Machine

The formal tool of the non-deterministic Turing Machine (NTM), comes


from the practical realisation that a lot of problems can only be solved
by exploring all possible solution-candidates. Consider the problem SAT,
defined in the first chapter. In order to establish whether a formula is
satisfiable, one must construct all possible interpretations I, and examine
I |= for each I. Similarly, in order to identify whether a graph contains a
Hamiltonian path, one must construct all such paths, and verify the Hamiltonian property for each one. There are two particularities of problems such
as these:
1. while building up a solution candidate, one cannot rule out other candidates. For SAT, when building an interpretation I, we cannot do
some checks on such that some interpretations can be ruled out in
advance. Similarly, when exploring a path for the Hamiltonian problem, we cannot rule out other paths. Thus, in the general case, we
may be required to backtrack and check other candidates.
2. building up and checking each candidate can be done independently
on the other candidates.
The non-deterministic TM is built around these two insights. Its purpose is
to examine the inherent nature of the problem hardness. One easily notices
that the source of hardness in problems such as SAT or Hamiltonian path
comes from (*) exploring an exponential number of candidates. However,
not all problems which are solved by exponential algorithms share this feature. For instance, solving the Towers of Hanoi problem is also exponential,
but the hardness here comes from describing the sequence of moves which
must be performed.6 Thus, the answer itself is exponential in this case.
The concept of NTM is meant to separate those problems who have (*)
as a source of exponential complexity, from other problems which also take
exponential time to solve. Intuitively, the NTM has the ability to generate
all solution candidates in polynomial time, by a process which is quite similar
to processor parallelism. In what follows, we will use parallelism as a suitable
analogy to describe the behaviour of the NTM. However, the reader must
6

The keen reader may notice that the Towers of Hanoi is not a decision problem.
Thus, the comparison with SAT and Hamiltonian path may not be technically the best
one. However, other decision problems which take exponential time and do not require
building, in an independent manner (see (ii)) all candidates, are less intuitive and not
simple to present.

47

note that this analogy is purely intuitive, and no actual parallelism occurs
in the case of NTM.
Definition 5.3.1 (Non-deterministic TM) A non-deterministic TM is
a TM M = (K, F, , , s0 ) over alphabet where K is a relation.
Example 5.3.1 (Non-deterministic TM)
Remark 5.3.1 (DTM vs NTM) The single difference between a DTM
and a NTM is in the way is defined. In the case of the DTM, the machine
performs a unique transition, to another state. In the case of NTM, such
a transition is not necessarily unique. Thus, a NTM can be in a number
of configurations at the same time. Each such configuration, including the
state and the contents of the tape, is independent of all others.
As a direct consequence, the execution of a DTM can be represented as
a linear (not necessarily finite) sequence of configurations, starting from the
initial one. In contrast, the execution of a NTM can be represented as a
tree.
Convention for describing NTM in pseudocode. As previously
stated it is more convenient to describe the behaviour of Deterministic
Turing Machines using pseudocode. We extend this convention for the NTM
as well. For this, we introduce the instruction:
v = choice(A)
where v is a variable and A is a set of values. The instruction has the
following behaviour:
the configuration of the NTM is replicated |A| times. Thus, we end
up with |A| configurations, one for each value in A.
in each configuration replica, the variable v has one distinct value from
A.
each configuration replica will continue to execute independently from
all others, in a fashion which mimics parallelism.
all configurations are independent of each other (in other words, program copies cannot communicate between themselves).
Example 5.3.2 (Pseudocode)
48

Proposition 5.3.1 (Equivalent DTM of a NTM) For every NTM M ,


there exists an equivalent DTM M 0 , such that M (w) = M 0 (w) for all words
w b , and M does not halt whenever M 0 does not halt. Moreover, if M
runs in polynomial time, then M runs in exponential time.
Proof:

The above proposition is key to establishing that NTMs do not add to
the expressive power of the DTM. Any problem which is solvable by a NTM
is also solvable by a DTM, only in a longer time.

5.4

Deterministic and non-deterministic complexity classes

Definition 5.4.1 (DTIME, NTIME, DSPACE, NSPACE) We denote


by:
DTIME(f (n)), the set of problems which are solvable by a DTM with
running time O(f (n))
NTIME(f (n)), the set of problems which are solvable by a NTM with
running time O(f (n)).
PSPACE(f (n)), the set of problems which are solvable by a DTM using
at most O(f (n)) tape cells;
NPSPACE(f (n)), the set of problems which are solvable by a NTM
using at most O(f (n)) tape cells;
Definition 5.4.2 (PTIME, NPTIME) We denote by PTIME (NPTIME),
the set of problems which are solvable by DTMs (NTMs) with (at most) polynomial running time, that is:
PTIME = dN DTIME(nd )
NPTIME = dN NTIME(nd )
Occasionally, we abbreviate PTIME and NPTIME by P and NP, respectively.
Definition 5.4.3 (PSPACE, NPSPACE) We denote by PSPACE
(NPSPACE), the set of problems which are solvable by DTMs (NTMs) with
(at most) polynomial running space, that is:
PSPACE = dN DSPACE(nd )
NPSPACE = dN NPSPACE(nd )
49

Definition 5.4.4 (EXPTIME, NEXPTIME) We denote by EXPTIME


(NEXPTIME), the set of problems which are solvable by DTMs (NTMs) with
(at most) exponential running time, that is:
d

EXPTIME = dN DTIME(2n )
d
NEXPTIME = dN NTIME(2n )
We present without proving the following relationship between these
classes7 :
P NP PSPACE = NPSPACE8 EXPTIME NEXPTIME
We call this relationship the hierarchy of complexity classes. The hierarchy is, at least partially, intuitive since, for instance, any problem which
can be solved in at most polynomial time by a DTM (hence in P), can also
be solved in at most exponential time by a DTM (hence in EXPTIME ).
Similarly, any problem which can be solved in at most polynomial time by
a DTM (hence in P) can also be solved in at most polynomial time by a
NTM (hence in NP ), since the DTM is a particular case of a NTM (functions of the DTM definition are particular cases of relations of the NTM
definition).
The task of the complexity theorist is to study the relationship between complexity classes, and yield insightful results such as PSPACE =
NPSPACE (which we shall not discuss in more detail).
The task of the (applied) computer scientist is to establish in which
complexity classes do his problems naturally belong to, and by this, settle
the hardness of the studied problems. This task is actually two-fold:
first, it involves deciding membership to a class. This is achieved by
devising an algorithm (formally a DTM), and examining its running
time. For instance, SAT EXPTIME, since it can be solved in exponential time. However, membership to a class offers only an upper
bound of the problem hardness. For instance, deciding whether a vector is sorted can also be performed in exponential time on some DTM,
however there is also a faster way to solve this problem.
7

We note that these are only a small subset of the known and studied complexity
classes
8
Due to Savitchs Theorem

50

second, it involves looking whether a problem does not belong9 in a


complexity class. Ideally, if for some problem Q, we know Q 6 P , then
we need not search for a polynomial algorithm to solve Q, since none
exists.
5.4.1

Hardness. Polynomial reductions

Unfortunately, there is no known mechanism for establishing Q 6 X, where


X is a complexity class and Q is a problem. In fact, this issue hides a yet
unsolved problem which continues to be one of the greater challenges of
(theoretical) Computer Science. We shall discuss its implications later on.
For now, we investigate a means for relaxing the condition Q 6 X, one for
which we can develop a suitable methodology. The relaxed version of the
above condition is Q is at least as hard as any problem from X. Up to this
point, we have not established what it means for a problem Q1 to be at least
as hard as another, Q2 . Intuitively, it means that we can use a DTM M2
which decides Q2 in order to decide Q1 . This is captured by the polynomial
reduction.
Definition 5.4.5 (Polynomial reduction) A problem Q1 : I1 {0, 1}
is polynomially reducible to a problem Q2 : I2 {0, 1} (written Q1 p Q2 )
iff there exists a total transformation function F : I1 I2 which can be computed in polynomial time by a DTM10 , such that Q1 (i) = 1 Q2 (F (i)) = 1
for each i I1 .
Intuitively, Q1 p Q2 should be read: Q1 can be solved using Q2 and a
polynomial transformation or, alternatively,: Q2 is at least as hard as Q1 .
Proposition 5.4.1 p is reflexive and transitive.
Proof: Left as exercise.

Remark 5.4.1 Proposition 5.4.1 indirectly highlights a distinctive feature


of the polynomial reduction, namely that it is not symmetric, since F need
not be bijective. Even though F takes all instances i I1 of the problem Q1
and transforms them into instances F (i) I2 of the problem Q2 , it does not
necessarily happen that each instance of Q2 can be obtained by applying F
on an instance of Q1 .
9

As we shall see further on, settling such an issue is not yet known.
Actually, F can be interpreted as a Turing Machine in itself, which takes encodings of
members of I1 and transforms them to encodings of members of I2
10

51

In other words, we say F transforms all instances of Q1 into some


instances of Q2 .
Example 5.4.1 (Polynomial reduction)
Proposition 5.4.2 If Q2 P and Q1 p Q2 then Q1 P .
Proof:

Proposition 5.4.3 If Q1 6 P and Q1 p Q2 then Q2 6 P .


Proof:

The above two propositions formally express the intuition behind the
definition of p . If Q1 can be solved using Q2 plus a polynomial transformation, and Q2 is polynomially solvable, then Q1 is polynomially solvable.
Also, if Q1 is not solvable in polynomial time and Q1 is reducible to Q2 ,
then it cannot be that Q2 is solvable in polynomial time (if this would be
so, then Q1 P ).
Definition 5.4.6 (Hardness) Let X be a complexity class and Q2 be a
problem. Q2 is called X-hard iff for all Q1 X, Q1 p Q2 .
A problem Q is X-hard iff it is at least as hard as any problem in X or,
alternatively, iff any problem in X can be solved using Q together with a
polynomial transformation.
Definition 5.4.7 (Completeness) A problem Q2 is complete with respect
to a complexity class X (or X-complete) iff it is X-hard and Q2 X.
Intuitively, X-complete problems can be seen as the hardest problems
in class X. In what follows, we will focus solely on the case X = NP, and
study the NP -hardness and NP -completeness of problems.
Proposition 5.4.4 (Methodology for establishing completeness) Let
Q, Q0 be two problems, Q is NP-hard, and Q p Q0 . Then Q0 is also NPhard.
Proof: Left as exercise.

While Proposition 5.4.4 offers a practical means for establishing NP hardness, it leaves a small gap. To show NP -hardness of a problem Q0 , one
requires a known NP -hard problem Q. Thus, the following question arises:
if no problem Q is known to be NP -hard, how do we show the NP -hardness
of some Q0 ? This is established by the following:
52

Theorem 5.4.1 (Cook) SAT is NP-complete.


Proof: We omit the proof of this theorem. However, we do state that the
proof relies on Definition 5.4.6, rather than Proposition 5.4.4.

Starting from SAT, one can use Proposition 5.4.4 to show other problems
to be NP -complete. These in turn, can be used as starting points for other
reductions. Currently, there are more than 3000 problems known to be
NP -complete.
5.4.2

The P 6= NP problem

The class NP can be intuitively characterized as the set of all problems for
which the verification of a solution candidate can be done in polynomial
time (the generation of solution candidates is done in polynomial time, using
choice). If P = NP, then, each problem (denoted f ) such that solution
candidates can be checked in polynomial time (f NP), can also be solved
in polynomial time f P . Thus P = NP implies that solving a problem
is as easy as checking if a candidate-solution for the problem is the correct
one.
We also add the following properties regarding the set of NP -hard /
complete problems:
Proposition 5.4.5 The set of NP-hard problems is closed under p .
Proof: Let Q be an NP -hard problem and Q p Q0 . Then Q0 is also NP hard.

Proposition 5.4.6 The set of NP-complete problems together with p is
an equivalence class.
Proof: Let Q, Q0 be two NP -complete problems. Hence for all Q NP,
Q p Q. In particular, since Q0 NP, Q = Q0 , hence Q0 p Q.

Proposition 5.4.6 highlights an intrinsic feature of NP -complete problems, namely that they are equivalent up to a polynomial transformation.
With this in mind, we end with the following property:
Proposition 5.4.7 Assume Q is an NP-complete problem. If Q P , then
P = NP.
Proof: Let M be the TM which solves Q in polynomial time. Since Q is
NP -complete, Q p Q, for all Q N P . Hence, we can solve Q by: (i)
53

applying a polynomial transformation and (ii) using M on the transformed


input. Thus, any problem Q NP can be solved in polynomial time. It
follows that P = NP.

In particular, if a polynomial algorithm is found for some NP -complete
problem, then all NP -complete problems can be solved in polynomial time.

References
[1] Ulrich Berger. CS 376 Programming with Abstract Data Types. Department of Computer Science, Swansea University, 2009.
[2] G.E. Moore. Cramming more components onto integrated circuits. Proceedings of the IEEE, 86(1):8285, 1998.

54

You might also like