Professional Documents
Culture Documents
problem is. Complexity theory gets very technical but the basics are
actually extraordinarily intuitive, and it's possible to understand the P
versus NP issue with very little math background.
Problem: sorting.
Search version: input a list of numbers X and output the same list in
sorted order (call it Y).
Verification version: input a list of numbers X and another list Y, and
output "YES" if Y is the sorted version of X and "NO" otherwise.
Problem: partition.
Search version: input some numbers X and divide the numbers into
two groups that add up to exactly the same value (call the assignment
of numbers to their group Y).
Verification version: input some numbers X and the groupings Y and
output "YES" if the two groups add up to the same value, or "NO"
otherwise.
Are there any problems for which the verification version can
be solved efficiently but for which there is no efficient
solution to the search version?
In the case of the sorting problem above, there are fast algorithms for
both the search and verification versions. But for the other two problems,
the verification versions are easy (heck, my grandmother could probably
write a computer program to check that two lists of numbers add up to the
same value) but the search versions are difficult, and indeed there are no
fast solutions known. So all three problems are in NP, but only the first is
(known to be) in P.
Some problems can be translated into one another in such a way that a
fast solution to one problem would automatically give us a fast solution to
the other. There are some problems that every single problem in NP can
be translated into, and a fast solution to such a problem would
automatically give us a fast solution to every problem in NP. This group of
problems are known as NP-Hard. Some problems in NP-Hard are actually
not themselves in NP; the group of problems that are in both NP and NP-
Hard is called NP-Complete.
You start to see the far-reaching implications of a fast solution to any one
problem in NP-Hard: we would automatically get a fast solution
to every problem in NP, which would mean that whenever there is a fast
solution to the verification version of a problem then there is always a
fast solution to the corresponding search version.
These refer to how long it takes a program to run. Problems in class P can
be solved with algorithms that run in polynomial time.
Say you have an algorithm that finds the smallest integer in an array. One
way to do this is by iterating over all the integers of the array and keeping
track of the smallest number you've seen up to that point. Every time you
look at an element, you compare it to the current minimum, and if it's
smaller, you update the minimum.
How long does this take? Let's say there are n elements in the array. For
every element the algorithm has to perform a constant number of
operations. Therefore we can say that the algorithm runs in O(n) time, or
that the runtime is a linear function of how many elements are in the
array.* So this algorithm runs in linear time.
If the running time is some polynomial function of the size of the input**,
for instance if the algorithm runs in linear time or quadratic time or cubic
time, then we say the algorithm runs in polynomial time and the
problem it solves is in class P.
NP
Now there are a lot of programs that don't (necessarily) run in polynomial
time on a regular computer, but do run in polynomial time on a
nondeterministic Turing machine. These programs solve problems in NP,
which stands for nondeterministic polynomial time. A
nondeterministic Turing machine can do everything a regular computer
can and more.*** This means all problems in P are also in NP.
Some people think P = NP, which means any problem that can be verified
in polynomial time can also be solved in polynomial time and vice versa.
If they could prove this, it would revolutionize computer science because
people would be able to construct faster algorithms for a lot of important
problems.
NP-hard
What does NP-hard mean? A lot of times you can solve a problem by
reducing it to a different problem. I can reduce Problem B to Problem A if,
given a solution to Problem A, I can easily construct a solution to Problem
B. (In this case, "easily" means "in polynomial time.")
NP-complete
NP-hard, and
in NP.
* A technical point: O(n) actually means the algorithm runs
in asymptotically linear time, which means the time complexity
approaches a line as n gets very large. Also, O(n) is technically an upper
bound, so if the algorithm ran in sublinear time you could still say it's O(n),
even if that's not the best description of it.
** Note that if the input has many different parameters, like n and k, it
might be polynomial in n and exponential in k
There are two classes of problems, P, and NP (there are many, many more,
but we will ignore the rest here). These stand for "Polynomial" and "Non-
deterministic Polynomial".
The typical logic is this: you claim to have a fast algorithm to, say, solve
subset sum. I create a machine that, in polynomial time, takes a traveling
salesman problem instance, and creates from it, an instance of subset
sum. If you can give my machine the answer to that question, it will,
again in polynomial time, change that back into the answer for my original
traveling salesman problem. Therefore, if your subset sum algorithm is
really polynomial, then I have just added some extra bits to it and created
a polynomial time algorithm for solving traveling salesman problems. If I
can create such a machine, then it creates a notion of reducibility, that is,
traveling salesman is reducible to subset sum, which indicates that subset
sum is at least as hard as traveling salesman. Intuitively, one might
imagine that with all the universal truth out there, one could say that
traveling salesman has no polynomial-time solution. In that case, the
above would be a proof by contradiction, that subset sum also has no
polynomial time solution.
Now, we can pretty easily state what NP-Complete and NP-Hard are: