As I recall from my computer science classes, the P=NP question means P problems can be SOLVED in polynomial time, but NP problems can be VERIFIED in polynomial time. I also recall that the travelling salesperson problem is in NP.
OK, I agree the travelling salesperson problem is apparently not in P because I don’t know any polynomial solution.
But how could it be in NP either? How could I possibly verify that a solution to the travelling salesperson problem really is the shortest solution unless I compared it to all the other possible solutions? How can I verify the answer without solving for every single path length?
Which version of the problem are you considering? The formal traveling salesman decision problem is to find a path below a specific length. If the solution is less than that length, the answer to the decision is ‘yes’. If not, the answer is ‘no’. That version is NP-complete in that it is essentially the same complexity as all other NP-complete decision problems, which, as you point out, can be verified in polynomial time.
The NP problems are usually formalized as “decision problems”, so the travelling salesman problem is expressed as “is there a route with length shorted than <x>?” for some x.
Of course an algorithm that figures out the shortest path can easily answer the decision problem just by comparing the length of the shortest path to x. An algorithm that only determines if there is a shortest path shorter than x, and provides a solution to be verified, can also be converted to one that finds the shortest path by doing a binary search on proposed path length.
A problem is in NP if a proposed solution can be verified as valid in polynomial time, not if it can be verified as optimal in polynomial time. In the case of the traveling salesman problem, an algorithm would have to verify that a proposed solution visits each customer exactly once using only defined routes, and would have to add up the costs of travel between the customers. This can be done in polynomial time.
By that standard, Jeff, one would argue that it’s in P, since you can also find a valid path from scratch in polynomial time. But the Traveling Salesman problem doesn’t just ask for a path that visits all cities; it asks for the shortest such path. So a path that visits all cities but isn’t shortest is not a valid solution to the TSP.
I’ll try this again. A problem is in NP if it’s possible to test whether something is a valid potential solution in polynomial time. It isn’t necessary to test whether it’s an optimal solution - just whether it violates the constraints of the problem. If you still don’t believe me, look at the definition on the Wikipedia page on NP:
Strictly, the Travelling Salesman Problem is in NPO, which is the class of NP optimization problems. However, all optimization problems have an equivalent decision problem formulation so we usually just lazily say an NPO problem is NP.
The decision problem formulation, as mentioned, is “there exists a path of at most cost k that goes through all cities except the starting one exactly once.”
You can think of the equivalent NP optimization algorithm as:
for all k in 0 to infinity:
non-deterministically check if there is a path of at most length k
if so, return k; else, continue
OK - I’m trying to be as clear as possible, and somehow I’m failing. I’ll try again.
There are many possible routes for a traveling salesman problem. One of them is the shortest. You don’t know which is shortest until you’ve solved the problem. You can look at a whole bunch of different routes and choose the shortest. When you do this, you need to be sure that each route you consider meets the requirement of visiting all the stops, and you need to add up the route’s total distance. This can be done in polynomial time for each route.
To find the shortest route takes a lot more work than that. One thing you can do is look at every possible route. The number of possible routes is an exponential function of the number of stops the salesman must visit, so if you do it this way the solution will take longer than polynomial time. There are things you can do to reduce the amount of work, but so far no one has found a solution that works in polynomial time.
In an NP problem, a test of a potential solution can be done in polynomial time. For the traveling salesman problem, a route is a potential solution. That doesn’t mean it’s the actual solution - that can only be found (as far as we know) by examining a large number of routes.
One definition of NP is that class of problems that can be solved in polynomial time by a non-deterministic finite state machine. We’re all familiar with a deterministic finite state machine - roughly speaking, it’s a machine that does one thing at a time. A non-deterministic finite state machine can make an infinite number of copies of itself and assign a different sub-task to each copy (needless to say, this type of machine is only theoretical). For the traveling salesman problem, a non-deterministic machine can split off copies of itself and assign the task of evaluating each possible route to a different copy. Because evaluating a route can be done in polynomial time, evaluating all the routes simultaneously in parallel can be done in polynomial time.
Jeff Lichter, I don’t think you are accurately describing NP or the matter at the heart of this thread. For one thing, you don’t touch on the particular notion of non-determinism specifically relevant to NP, but only discuss non-determinism fuzzily; it’s unclear your fuzzy account of non-determinism accurately captures (or even attempts to capture), for example, the distinction between NP and co-NP (not to mention the further levels of the polynomial hierarchy).
(FWIW, Jragon gave a good, accurate account of the matter at the heart of this thread, as did leahcim)
(I’ll note as a pedantic modification to Jragon’s post that we will want to use leahcim’s trick of binary search to find the minimal k (that is, first repeatedly double candidates till finding some upper bound on the minimal k, in number of rounds logarithmic in the minimal k, then binary search between that upper bound and zero to find the precise minimal k, again in number of rounds logarithmic in the minimal k) rather than counting up one by one from zero (which would take a number of rounds linear in the minimal k, and thus exponential in its number of digits)).
Finding a cycle in a graph that visits each node just once is the Hamilton Cycle problem which is NP-Complete.
The hard part of TSP is the circuit. The shortest path part is not the NP-hard part.
Because of the standard reduction methods between such problems, given any deterministic algorithm for Hamiltonian Cycle we can use it to solve TSP without adding more than a small polynomial factor in time.
But finding a cycle in a complete graph (i.e. one where there is an edge between every pair of vertices) is trivial – every sequence of the vertices that starts and ends at the same vertex is a cycle.
The graph associated with the TSP is a complete graph – the salesman can go from any city to any other without restriction. Certain edges may be so high-cost that they are unlikely to be part of any optimal solution, though.
An algorithm is said to be of polynomial time if its running time is upper bounded by a polynomial expression in the size of the input for the algorithm, i.e., T(n) = O(nk) for some constant k. Problems for which a deterministic polynomial time algorithm exists belong to the complexity class P, which is central in the field of computational complexity theory. Cobham’s thesis states that polynomial time is a synonym for “tractable”, “feasible”, “efficient”, or “fast”.
Some examples of polynomial time algorithms:
[li]The selection sort sorting algorithm on n integers performs An^2 operations for some constant A. Thus it runs in time O(n^2) and is a polynomial time algorithm.[/li][li] All the basic arithmetic operations (addition, subtraction, multiplication, division, and comparison) can be done in polynomial time.[/li][li] Maximum matchings in graphs can be found in polynomial time.[/li][/QUOTE]
It seems other posters know way more about this than I do.
Not if you think about it. Start with an incomplete graph you want to find the HC of. Distances of 1 are assigned to each edge. To make it complete, add in new edges with weight 900 kajillion. Yeah, you can find cycles in that, but it’s completely uninteresting. Those edges may as well not be there. Hence, a lot of people don’t require a complete graph for TSP. It’s an extra, completely unneeded condition. I’m surprised that Wikipedia states it as a requirement.
For those who think making some of the distances seemingly violate the laws of geometry. Uh, geometry ain’t got nothin’ to do with this. It’s all abstract.
If the nodes are located in 2 (or 3 or …) space and the distances are based on a metric like Euclidean distance, you have a vastly easier problem.
So forget any requirement that TSP is for complete graphs. It’s a canard.
Here’s a handy way to think about polynomial time: If the input size gets twice as big, how much longer does it take to solve it? Polynomial time means that it goes up by most a constant factor. E.g., a linear time solution doubles, a quadratic one quadruples, etc.
A (simple) exponential time problem has a terrible characteristic. If you increase the problem size by as little as one, the running goes up by a constant. E.g., doubling or worse.
You can endure many doublings, etc. when the problem size doubles. But doubling each time you add one more thing to the input? Nope, you’re going to hit a big wall real soon.
Except that, even if some edges are too long to be used in the solution, you don’t know which ones those are until you have the solution. You start with an arbitrary graph, which may well not have any edges equal to a kajillion. You have n(n-1)/2 edges, and you know that you’re only going to need n of them. So there are n(n-1)/2 - n superfluous edges, that it wouldn’t change anything if you set them equal to a kajillion instead of whatever they were in the problem as given to you. But which edges are those? Almost certainly not the n(n-1)/2 - n longest ones. And then, even if you did somehow manage to eliminate all of those edges, it still wouldn’t be hard to construct the Hamiltonian path, since once you’ve eliminated the superfluous edges, the Hamiltonian path is all that’s left of the graph. And while the nodes being in a metric space is a special case of the problem, it isn’t actually a special case that makes anything about the problem any easier.
The problem “Given a list of cities and distances between them which make Euclidean planar sense, is there a path which visits every city at least once (possibly more than once, and possibly ending in a different location than the starting point) of length less than some specified bound?” is NP-complete. I wouldn’t, therefore, say the difficulty of the traveling salesman problem is intrinsically related to the problem being formulated in such a way as that one has to visit each node exactly once (or complete a circuit, or allow for non-Euclidean distances, or any such thing). It is not necessary for NP-completeness to formulate the task in such a way as that finding a Hamiltonian circuit on an arbitrary graph is a direct subtask (though, of course, any NP-complete problem reduces to any other under suitable transformation).
Rather, I therefore wouldn’t say ALL the difficulty of the traveling salesman problem is from this factor. It’s true that, for example, Euclidean planar TSP, while still NP-complete, nonetheless seems to allow for easier approximately (rather than fully) optimal solution.
This is incorrect. TSP restricted to complete graphs, where proving the existence of circuits is trivial, is just as NP complete as TSP on general graphs. As you note, a TSP solver that works only on complete graphs can be converted to one that works on arbitrary graphs just by adding extra high-weight edges, and of course one that works on arbitrary graphs also works on complete graphs.
People talk about the travelling salesman problem assuming one can go from any city to any other because it makes the narrative seem more sensible and doesn’t take any essential complexity away from the problem.