I recognize that finding the optimum solution to the Traveling Salesman Problem is hard (NP-hard, to be precise). My question, then, has to do with the distribution of the values in the solution set for any particular instance of the Problem. Are such solutions normally distributed?

I know I can’t be the first, and probably not even the millionth to wonder about this, but still, let me ask - if the solutions are normally distributed, wouldn’t it be possible to compute a “representative” sample of solutions (requiring much less effort and time than computing them all) and, after, considering the distribution curve that those solutions generate, be able to determine a “nearly optimum” solution", i.e. one within x% of the optimum, where ‘x’ is whatever you want it to be?

I would suspect that for the TSP problem in general, you cannot make assumptions about whether the solutions are normally distributed or not… based partly on how the individual graph-edges (costs to travel between any particular pair of cities) are distributed.

In general, I would think that calculating a representative sample and taking the best available would still be a good real-life algorithm to approximate a solution, but it would be hard to tell just how good of an approximation it is without solving for the optimal solution.

It would be an interesting exercise to generate TSP graphs that are as large as possible while still being solvable with brute-force in a decent amount of time… and see… hmm – that’d be the problem actually. If you solve TSP on 12 cities, you have 12! possible solutions, about half a billion, which would be many too many to graph. You might be able to sort them all and then take every millionth solution as a graphing sample. Just to see how the overall distribution tends to go.

The OP lacks key information. In particular, what does “normally distributed” range over? So the following might not address the question.

It is trivial to show that approximate TSP is NP-hard. I.e., to find a solution that is only half (or whatever) as good as the optimal is just as hard as the original problem. It is so easy to prove I have given this as a homework problem in algorithms classes.

This is unlike some other NP-Complete problems like bin packing where easy polytime approximation algorithms are known. So this sort of thing varies from problem to problem.

Note that being NP-hard does not mean that the problem is truly difficult. It may turn out to be polytime. (Which would of course surprise most people.)

okay, I admit there is no chance whatsoever that I will be able to contribute, but would someone please clue me in as to what the traveling salesman problem is so perhaps I might learn something?

Indeed, I actually did know that, but expressed myself very poorly.

What I meant to say, and what I should have said, was that if you you know that the solutions are normally distributed, you should be able to come up with a solution that is “better” than x% of the entire solution set, with ‘x’ being whatever you want it to be.

I have a hard time believing this problem could be normally distrubuted or even could be represented by a normal distribution. There can be only one (possibly with a few ties) optimal solution. If there’s a normal distribution, this would imply that there are some solutions on both sides of the solution, which violates the optimality of the solution.

There are much simpler ways to get good bounds on a TSPs optimal solution. For instance, a good lower bound is the MST, which can be calculated in O(m) where m is the number of edges. You can also get a reasonable upperbound guarenteed no more than 150% the optimal solution in O(m). There are better quick upper bounds, but they require more than linear time, and I’d have to go back and look at my Combinatorial Optimizations text to remember.

Let me present my interpretation of what you’re trying to say here. I have a collection of cities, and I know the distances between every pair of these cities. I look at every possible route that goes through all cities without repetition and calculate the length of every route. You’re asking whether these lengths fall into a normal distribution.

In the general case, they do not. I can provide a counterexample.

According to the official definition of the TSP, we may present an instance of the problem with any distances we choose. We may even choose distances that are entirely against intuition. For instance, suppose we take these cities.

Atlanta
Boston
Chicago
Denver
El Paso
Fargo

I will create an instance of the TSP as follows: the distance from Atlanta to Boston is 500 miles. The distance between any other pair of cities is 100 miles.

In that case, what does the distribution of lengths of routes look likes? Given any route, the only relevant question is whether it includes that length from Atlanta to Boston. For any route including the Atlanta-Boston leg, the total length is 500 + 100 + 100 + 100 + 100 + 100 = 1000. For any route not including the Atlanta-Boston leg, the total length is 100 + 100 + 100 + 100 + 100 + 100 = 600. Thus, the distribution of solutions contains only two points, one at 1000 and one at 600.

This is an extreme cases. One might look at restricted cases. For instance, if I declared that all distances on the map would be chosen randomly from between 100 and 200 miles, then the distribution of solutions might well be normal.

True. But what good is an average solution in this case in terms of locating an optimal solution? We have no way of knowing how much worse than optimal this average solution is. It would still take O(m) to caluclate an average solution, but we have no idea how close to optimal it is. Further, why would we care? If we calculated 100 different Hamiltonian tours to find a good guess at the average, clearly the minimum of those 100 tours would be a better guess at the optimum than the average of those 100 tours.

It’s obvious that the solution space is never normally distributed, because it’s a finite space, and the normal distribution is continuous. The question is whether the normal distribution is a reasonable approximation to the distribution of solutions. If it is, the degree to which it’s a good approximation should depend on the number of solution costs. It might be worthwhile to consider TSPs over a (finite) metric space.

It’s not. As you’ll notice, I mentioned in post #4 that there are relatively easy ways to get good solutions. It’s still an interesting theoretical question.

No, the fastest IS linear. From the Wiki article you just cited:

As this states, there’s a well known linear algorithm for integer edge weights.

I just read this paper 2-3 weeks ago. Tarjan et al found a linear algorithm with [VERY] high probability for Real edge weights; this was in 1994. According to my professor (sorry, no cite), some time since then they’ve improved upon this result by making it non-randomized, and thus O(m) with no caveat for Real edge weights.

Yes, they are different problems; however, an MST is a fairly tight lower bound on the TSP (MST has n-1 edges, TSP has n eges). Also, MST is used in a lot of poly-time estimations.

As is normally definted, yes. However, if you’re okay with a larger error, you may not need to go through all that work. There ARE linear and higher order poly-time TSP approximating algorithms that guarentee you to be no more than x% greater than the optimal. I know of at least one off the top of my head that guarantees a solution no more than 150% the optimal TSP and can calculate that in O(m) and it requires finding the MST. If <=150% is good enough, why break out the big guns and tackle the harder problem?

Thought I’d bring this thread back up just to note that I managed to work out every possible solution to a sample randomly generated TSP problem… obviously, the premises that I was working with to generate my problem could have biased the results, but I did generate a cost versus frequency graph to see what it looked like.

It was triangle-shaped. Pretty clearly, as far as I could tell. A sharp incline from the worst possible solutions up to the average, median solutions, and then a sharp decline from those down to the very best ones. Very little trace of curvature in the inclines, though on the ‘up’ slope there was maybe a very little hint of a bell curve. Might have been imagining it though.