You are on page 1of 5

Complexity classes are one way to talk about how difficult or easy a

problem is. Complexity theory gets very technical but the basics are
actually extraordinarily intuitive, and it's possible to understand the P
versus NP issue with very little math background.

The kinds of problems we deal with in complexity theory come in pairs: a


"search" version and a "verification" version. For example --

Problem: sorting.
Search version: input a list of numbers X and output the same list in
sorted order (call it Y).
Verification version: input a list of numbers X and another list Y, and
output "YES" if Y is the sorted version of X and "NO" otherwise.

Problem: graph coloring.


Search version: input a network of nodes and edges X, and output
colors Y for each node such that no adjacent nodes have the same
color.
Verification version: input a network X and a set of colors Y and
output "YES" if all adjacent nodes have different colors and "NO"
otherwise.

Problem: partition.
Search version: input some numbers X and divide the numbers into
two groups that add up to exactly the same value (call the assignment
of numbers to their group Y).
Verification version: input some numbers X and the groupings Y and
output "YES" if the two groups add up to the same value, or "NO"
otherwise.

This is the P versus NP problem:

Are there any problems for which the verification version can
be solved efficiently but for which there is no efficient
solution to the search version?

If there is a fast solution to the search version of a problem then the


problem is said to be Polynomial-time, or P for short. If there is a fast
solution to the verification version of a problem then the problem is said to
be Nondeterministic Polynomial-time, or NP for short. The question of
"P=NP" is then the question of whether these sets are identical.

(The "nondeterministic polynomial-time" terminology is terribly counter-


intuitive in my opinion. It was used originally because whenever a Turing
machine can efficiently solve the verification version of a problem, a non-
deterministic Turing machine can efficiently solve the
corresponding search problem. But this is really not at all important to
understanding P vs NP.)

In the case of the sorting problem above, there are fast algorithms for
both the search and verification versions. But for the other two problems,
the verification versions are easy (heck, my grandmother could probably
write a computer program to check that two lists of numbers add up to the
same value) but the search versions are difficult, and indeed there are no
fast solutions known. So all three problems are in NP, but only the first is
(known to be) in P.

Some problems can be translated into one another in such a way that a
fast solution to one problem would automatically give us a fast solution to
the other. There are some problems that every single problem in NP can
be translated into, and a fast solution to such a problem would
automatically give us a fast solution to every problem in NP. This group of
problems are known as NP-Hard. Some problems in NP-Hard are actually
not themselves in NP; the group of problems that are in both NP and NP-
Hard is called NP-Complete.

You start to see the far-reaching implications of a fast solution to any one
problem in NP-Hard: we would automatically get a fast solution
to every problem in NP, which would mean that whenever there is a fast
solution to the verification version of a problem then there is always a
fast solution to the corresponding search version.

Remember how the verification versions of those problems seemed easy


but the search versions seemed hard? A fast solution to any NP-Complete
problem would mean that as long as you can verify proposed solutions to
a problem you would never need to search through a substantial fraction
of the search space to find solutions; there would always be a faster way.
This seems implausible to most mathematicians (and for deeper reasons
that I've listed here) and that is why most mathematicians think that there
are no fast solutions to NP-complete problems. But we haven't proved it
yet.

These refer to how long it takes a program to run. Problems in class P can
be solved with algorithms that run in polynomial time.

Say you have an algorithm that finds the smallest integer in an array. One
way to do this is by iterating over all the integers of the array and keeping
track of the smallest number you've seen up to that point. Every time you
look at an element, you compare it to the current minimum, and if it's
smaller, you update the minimum.

How long does this take? Let's say there are n elements in the array. For
every element the algorithm has to perform a constant number of
operations. Therefore we can say that the algorithm runs in O(n) time, or
that the runtime is a linear function of how many elements are in the
array.* So this algorithm runs in linear time.

You can also have algorithms that run in quadratic


time (O(n^2)), exponential time (O(2^n)), or even logarithmic
time (O(log n)). Binary search (on a balanced tree) runs in logarithmic
time because the height of the binary search tree is a logarithmic function
of the number of elements in the tree.

If the running time is some polynomial function of the size of the input**,
for instance if the algorithm runs in linear time or quadratic time or cubic
time, then we say the algorithm runs in polynomial time and the
problem it solves is in class P.

NP

Now there are a lot of programs that don't (necessarily) run in polynomial
time on a regular computer, but do run in polynomial time on a
nondeterministic Turing machine. These programs solve problems in NP,
which stands for nondeterministic polynomial time. A
nondeterministic Turing machine can do everything a regular computer
can and more.*** This means all problems in P are also in NP.

An equivalent way to define NP is by pointing to the problems that can be


verified in polynomial time. This means there is not necessarily a
polynomial-time way to find a solution, but once you have a solution it
only takes polynomial time to verify that it is correct.

Some people think P = NP, which means any problem that can be verified
in polynomial time can also be solved in polynomial time and vice versa.
If they could prove this, it would revolutionize computer science because
people would be able to construct faster algorithms for a lot of important
problems.

NP-hard

What does NP-hard mean? A lot of times you can solve a problem by
reducing it to a different problem. I can reduce Problem B to Problem A if,
given a solution to Problem A, I can easily construct a solution to Problem
B. (In this case, "easily" means "in polynomial time.")

If a problem is NP-hard, this means I can reduce any problem in NP to


that problem. This means if I can solve that problem, I can easily solve
any problem in NP. If we could solve an NP-hard problem in polynomial
time, this would prove P = NP.

NP-complete

A problem is NP-complete if the problem is both

NP-hard, and

in NP.
* A technical point: O(n) actually means the algorithm runs
in asymptotically linear time, which means the time complexity
approaches a line as n gets very large. Also, O(n) is technically an upper
bound, so if the algorithm ran in sublinear time you could still say it's O(n),
even if that's not the best description of it.

** Note that if the input has many different parameters, like n and k, it
might be polynomial in n and exponential in k

*** Per Xuan Luo's comment, deterministic and nondeterministic Turing


machines can compute exactly the same things, since every
nondeterministic Turing machine can be simulated by a deterministic
Turing machine (a "regular computer"). However, they may compute
things in different amounts of time.

There are two classes of problems, P, and NP (there are many, many more,
but we will ignore the rest here). These stand for "Polynomial" and "Non-
deterministic Polynomial".

Assuming knowledge of big-O notation, we'll move on from there. Request


an explanation and you shall have it.

Problems in P are solvable by a polynomial-time algorithm (runs


in O(f(n))O(f(n)), where ffis polynomial). This includes things like sorting
and triangulation.

Problems in NP are checkable by a polynomial-time algorithm, but not


necessarily solvable. Generally, we mean NP minus P, when we say NP (so
they are only checkable). This means that, given the solution to an
instance of the problem, we can verify that it is the correct solution in
polynomial time, but that we couldn't have come up with it on our own, in
polynomial time. An easy example of this is subset sum: given a set of
numbers, does there exist a subset whose sum is zero? There is no known
way to solve this problem in polynomial time, but given the answer, we
can easily check whether it is right (the decision problem is a little harder
to see, but assume you are given the actual subset).

Now, we have to discuss NP-Complete and NP-Hard, which requires a


discussion of reducibility. Reducibility is the notion that, given an
instance of some hard problem, we can quickly construct an instance of
some other problem, whose answer would help us solve the first problem.

The typical logic is this: you claim to have a fast algorithm to, say, solve
subset sum. I create a machine that, in polynomial time, takes a traveling
salesman problem instance, and creates from it, an instance of subset
sum. If you can give my machine the answer to that question, it will,
again in polynomial time, change that back into the answer for my original
traveling salesman problem. Therefore, if your subset sum algorithm is
really polynomial, then I have just added some extra bits to it and created
a polynomial time algorithm for solving traveling salesman problems. If I
can create such a machine, then it creates a notion of reducibility, that is,
traveling salesman is reducible to subset sum, which indicates that subset
sum is at least as hard as traveling salesman. Intuitively, one might
imagine that with all the universal truth out there, one could say that
traveling salesman has no polynomial-time solution. In that case, the
above would be a proof by contradiction, that subset sum also has no
polynomial time solution.

Now, we can pretty easily state what NP-Complete and NP-Hard are:

If two problems can be reduced to each-other, they are in some sense,


"equivalent". All problems in NP that are so equivalent, are called NP-
Complete.

If there is an NP-Complete problem that can be reduced to some problem


H, then H is NP-Hard. This means that, in some sense, H is at least as
hard as all the NP-Complete problems, but note that there does not need
to be a reduction from H to any NP-Complete problem, so H might be
harder than all the NP problems.

One proves that a problem is NP-Hard by taking an existing NP-Complete


problem (3-SAT and Traveling Salesman are my favorites, since they work
well for geometry), and using it to generate input to your problem, such
that an answer to your problem is equivalent to solving the NP-Complete
problem you chose.

You might also like