Straight Dope Message Board > Main Traveling Salesman Problem: Are solutions normally distributed?
 Register FAQ Calendar Mark Forums Read

#1
05-21-2006, 09:00 AM
 KarlGauss An old man in a dry month Charter Member Join Date: Mar 2000 Location: Between pole and tropic Posts: 5,904
Traveling Salesman Problem: Are solutions normally distributed?

I recognize that finding the optimum solution to the Traveling Salesman Problem is hard (NP-hard, to be precise). My question, then, has to do with the distribution of the values in the solution set for any particular instance of the Problem. Are such solutions normally distributed?

I know I can't be the first, and probably not even the millionth to wonder about this, but still, let me ask - if the solutions are normally distributed, wouldn't it be possible to compute a "representative" sample of solutions (requiring much less effort and time than computing them all) and, after, considering the distribution curve that those solutions generate, be able to determine a "nearly optimum" solution", i.e. one within x% of the optimum, where 'x' is whatever you want it to be?

As always, thanks!
#2
05-21-2006, 09:27 AM
 LiveOnAPlane Guest Join Date: Sep 2005
Yes, having experienced this in undergrad as well as grad school, there are indeed nearly optimal solutions to this.

But, and I am talking out of my butt here, when one goes to x% of optimal, then one has to compare this aginst the *optimal* solution.

Therein lies the rub, AFAIK.
#3
05-21-2006, 09:28 AM
 chrisk Charter Member Join Date: Nov 2003 Location: Southern ontario Posts: 5,656
Quote:
 Originally Posted by KarlGauss I recognize that finding the optimum solution to the Traveling Salesman Problem is hard (NP-hard, to be precise). My question, then, has to do with the distribution of the values in the solution set for any particular instance of the Problem. Are such solutions normally distributed? I know I can't be the first, and probably not even the millionth to wonder about this, but still, let me ask - if the solutions are normally distributed, wouldn't it be possible to compute a "representative" sample of solutions (requiring much less effort and time than computing them all) and, after, considering the distribution curve that those solutions generate, be able to determine a "nearly optimum" solution", i.e. one within x% of the optimum, where 'x' is whatever you want it to be? As always, thanks!

I would suspect that for the TSP problem in general, you cannot make assumptions about whether the solutions are normally distributed or not... based partly on how the individual graph-edges (costs to travel between any particular pair of cities) are distributed.

In general, I would think that calculating a representative sample and taking the best available would still be a good real-life algorithm to approximate a solution, but it would be hard to tell just how good of an approximation it is without solving for the optimal solution.

It would be an interesting exercise to generate TSP graphs that are as large as possible while still being solvable with brute-force in a decent amount of time... and see... hmm -- that'd be the problem actually. If you solve TSP on 12 cities, you have 12! possible solutions, about half a billion, which would be many too many to graph. You might be able to sort them all and then take every millionth solution as a graphing sample. Just to see how the overall distribution tends to go.
#4
05-21-2006, 09:32 AM
 ultrafilter Guest Join Date: May 2001
It might be possible, but for any practical purpose, it's easy enough to get a really good solution through genetic programming.

Quote:
 Originally Posted by LiveOnAPlane But, and I am talking out of my butt here, when one goes to x% of optimal, then one has to compare this aginst the *optimal* solution.
Not necessarily. You can place error bound on most approximation algorithms out there deductively.
#5
05-21-2006, 10:13 AM
 ftg Guest Join Date: Feb 2001
The OP lacks key information. In particular, what does "normally distributed" range over? So the following might not address the question.

It is trivial to show that approximate TSP is NP-hard. I.e., to find a solution that is only half (or whatever) as good as the optimal is just as hard as the original problem. It is so easy to prove I have given this as a homework problem in algorithms classes.

This is unlike some other NP-Complete problems like bin packing where easy polytime approximation algorithms are known. So this sort of thing varies from problem to problem.

Note that being NP-hard does not mean that the problem is truly difficult. It may turn out to be polytime. (Which would of course surprise most people.)
#6
05-21-2006, 07:56 PM
 Chronos Charter Member Join Date: Jan 2000 Location: The Land of Cleves Posts: 47,896
Note that just knowing that something's approximately normally distributed cannot tell you the bound, because normal distributions don't have bounds.
#7
05-21-2006, 10:21 PM
 seenidog Guest Join Date: Apr 2003
okay, I admit there is no chance whatsoever that I will be able to contribute, but would someone please clue me in as to what the traveling salesman problem is so perhaps I might learn something?
#8
05-21-2006, 10:46 PM
 ultrafilter Guest Join Date: May 2001
Quote:
 Originally Posted by seenidog okay, I admit there is no chance whatsoever that I will be able to contribute, but would someone please clue me in as to what the traveling salesman problem is so perhaps I might learn something?
There's a nice write-up here.
#9
05-21-2006, 11:49 PM
 KarlGauss An old man in a dry month Charter Member Join Date: Mar 2000 Location: Between pole and tropic Posts: 5,904
Quote:
 Originally Posted by Chronos Note that just knowing that something's approximately normally distributed cannot tell you the bound, because normal distributions don't have bounds.
Indeed, I actually did know that, but expressed myself very poorly.

What I meant to say, and what I should have said, was that if you you know that the solutions are normally distributed, you should be able to come up with a solution that is "better" than x% of the entire solution set, with 'x' being whatever you want it to be.

And, thanks to everyone for their input!
#10
05-22-2006, 09:11 AM
 Blaster Master Guest Join Date: Feb 2006
Quote:
 Originally Posted by KarlGauss Indeed, I actually did know that, but expressed myself very poorly. What I meant to say, and what I should have said, was that if you you know that the solutions are normally distributed, you should be able to come up with a solution that is "better" than x% of the entire solution set, with 'x' being whatever you want it to be. And, thanks to everyone for their input!
I have a hard time believing this problem could be normally distrubuted or even could be represented by a normal distribution. There can be only one (possibly with a few ties) optimal solution. If there's a normal distribution, this would imply that there are some solutions on both sides of the solution, which violates the optimality of the solution.

There are much simpler ways to get good bounds on a TSPs optimal solution. For instance, a good lower bound is the MST, which can be calculated in O(m) where m is the number of edges. You can also get a reasonable upperbound guarenteed no more than 150% the optimal solution in O(m). There are better quick upper bounds, but they require more than linear time, and I'd have to go back and look at my Combinatorial Optimizations text to remember.
#11
05-22-2006, 09:22 AM
 ultrafilter Guest Join Date: May 2001
Quote:
 Originally Posted by Blaster Master I have a hard time believing this problem could be normally distrubuted or even could be represented by a normal distribution. There can be only one (possibly with a few ties) optimal solution. If there's a normal distribution, this would imply that there are some solutions on both sides of the solution, which violates the optimality of the solution.
But there are solutions on both sides of the average solution.
#12
05-22-2006, 09:33 AM
 ITR champion Guest Join Date: Jan 2001
Quote:
 Originally Posted by KarlGauss I recognize that finding the optimum solution to the Traveling Salesman Problem is hard (NP-hard, to be precise). My question, then, has to do with the distribution of the values in the solution set for any particular instance of the Problem. Are such solutions normally distributed? I know I can't be the first, and probably not even the millionth to wonder about this, but still, let me ask - if the solutions are normally distributed, wouldn't it be possible to compute a "representative" sample of solutions (requiring much less effort and time than computing them all) and, after, considering the distribution curve that those solutions generate, be able to determine a "nearly optimum" solution", i.e. one within x% of the optimum, where 'x' is whatever you want it to be? As always, thanks!
Let me present my interpretation of what you're trying to say here. I have a collection of cities, and I know the distances between every pair of these cities. I look at every possible route that goes through all cities without repetition and calculate the length of every route. You're asking whether these lengths fall into a normal distribution.

In the general case, they do not. I can provide a counterexample.

According to the official definition of the TSP, we may present an instance of the problem with any distances we choose. We may even choose distances that are entirely against intuition. For instance, suppose we take these cities.

Atlanta
Boston
Chicago
Denver
El Paso
Fargo

I will create an instance of the TSP as follows: the distance from Atlanta to Boston is 500 miles. The distance between any other pair of cities is 100 miles.

In that case, what does the distribution of lengths of routes look likes? Given any route, the only relevant question is whether it includes that length from Atlanta to Boston. For any route including the Atlanta-Boston leg, the total length is 500 + 100 + 100 + 100 + 100 + 100 = 1000. For any route not including the Atlanta-Boston leg, the total length is 100 + 100 + 100 + 100 + 100 + 100 = 600. Thus, the distribution of solutions contains only two points, one at 1000 and one at 600.

This is an extreme cases. One might look at restricted cases. For instance, if I declared that all distances on the map would be chosen randomly from between 100 and 200 miles, then the distribution of solutions might well be normal.
#13
05-22-2006, 09:44 AM
 Blaster Master Guest Join Date: Feb 2006
Quote:
 Originally Posted by ultrafilter But there are solutions on both sides of the average solution.
True. But what good is an average solution in this case in terms of locating an optimal solution? We have no way of knowing how much worse than optimal this average solution is. It would still take O(m) to caluclate an average solution, but we have no idea how close to optimal it is. Further, why would we care? If we calculated 100 different Hamiltonian tours to find a good guess at the average, clearly the minimum of those 100 tours would be a better guess at the optimum than the average of those 100 tours.
#14
05-22-2006, 10:51 AM
 ultrafilter Guest Join Date: May 2001
Quote:
 Originally Posted by ITR champion Let me present my interpretation of what you're trying to say here. I have a collection of cities, and I know the distances between every pair of these cities. I look at every possible route that goes through all cities without repetition and calculate the length of every route. You're asking whether these lengths fall into a normal distribution. In the general case, they do not. I can provide a counterexample. ... This is an extreme cases. One might look at restricted cases. For instance, if I declared that all distances on the map would be chosen randomly from between 100 and 200 miles, then the distribution of solutions might well be normal.
It's obvious that the solution space is never normally distributed, because it's a finite space, and the normal distribution is continuous. The question is whether the normal distribution is a reasonable approximation to the distribution of solutions. If it is, the degree to which it's a good approximation should depend on the number of solution costs. It might be worthwhile to consider TSPs over a (finite) metric space.

Quote:
 Originally Posted by Blaster Master True. But what good is an average solution in this case in terms of locating an optimal solution?
It's not. As you'll notice, I mentioned in post #4 that there are relatively easy ways to get good solutions. It's still an interesting theoretical question.
#15
05-22-2006, 11:04 AM
 ftg Guest Join Date: Feb 2001
Quote:
 Originally Posted by Blaster Master There are much simpler ways to get good bounds on a TSPs optimal solution. For instance, a good lower bound is the MST, which can be calculated in O(m) where m is the number of edges. You can also get a reasonable upperbound guarenteed no more than 150% the optimal solution in O(m). There are better quick upper bounds, but they require more than linear time, and I'd have to go back and look at my Combinatorial Optimizations text to remember.
1. The current best MST algorithm is not O(m) but almost linear.

2. What does MST have anything to do with this???? Completely different problems.

3. See my earlier post. Approximate TSP is NP-hard.
#16
05-22-2006, 01:34 PM
 Blaster Master Guest Join Date: Feb 2006
Quote:
 Originally Posted by ftg 1. The current best MST algorithm is not O(m) but almost linear.
No, the fastest IS linear. From the Wiki article you just cited:
Quote:
 Originally Posted by Wikipedia If the edge weights are integers with a bounded bit length, then deterministic algorithms are known with linear running time, O(m). For general weights, randomized algorithms are known that run in linear expected time.
As this states, there's a well known linear algorithm for integer edge weights.

I just read this paper 2-3 weeks ago. Tarjan et al found a linear algorithm with [VERY] high probability for Real edge weights; this was in 1994. According to my professor (sorry, no cite), some time since then they've improved upon this result by making it non-randomized, and thus O(m) with no caveat for Real edge weights.
Quote:
 2. What does MST have anything to do with this???? Completely different problems.
Yes, they are different problems; however, an MST is a fairly tight lower bound on the TSP (MST has n-1 edges, TSP has n eges). Also, MST is used in a lot of poly-time estimations.
Quote:
 3. See my earlier post. Approximate TSP is NP-hard.
As is normally definted, yes. However, if you're okay with a larger error, you may not need to go through all that work. There ARE linear and higher order poly-time TSP approximating algorithms that guarentee you to be no more than x% greater than the optimal. I know of at least one off the top of my head that guarantees a solution no more than 150% the optimal TSP and can calculate that in O(m) and it requires finding the MST. If <=150% is good enough, why break out the big guns and tackle the harder problem?
#17
06-02-2006, 09:08 PM
 chrisk Charter Member Join Date: Nov 2003 Location: Southern ontario Posts: 5,656
Thought I'd bring this thread back up just to note that I managed to work out every possible solution to a sample randomly generated TSP problem... obviously, the premises that I was working with to generate my problem could have biased the results, but I did generate a cost versus frequency graph to see what it looked like.

It was triangle-shaped. Pretty clearly, as far as I could tell. A sharp incline from the worst possible solutions up to the average, median solutions, and then a sharp decline from those down to the very best ones. Very little trace of curvature in the inclines, though on the 'up' slope there was maybe a very little hint of a bell curve. Might have been imagining it though.
#18
06-02-2006, 09:10 PM
 ultrafilter Guest Join Date: May 2001
How many vertices?
#19
06-03-2006, 06:40 AM
 chrisk Charter Member Join Date: Nov 2003 Location: Southern ontario Posts: 5,656
Quote:
 Originally Posted by ultrafilter How many vertices?
Twelve, if you mean the cities in my sample.

 Bookmarks

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home Main     About This Message Board     Comments on Cecil's Columns/Staff Reports     Straight Dope Chicago     General Questions     Great Debates     Elections     Cafe Society     The Game Room     In My Humble Opinion (IMHO)     Mundane Pointless Stuff I Must Share (MPSIMS)     Marketplace     The BBQ Pit Side Conversations     The Barn House

All times are GMT -5. The time now is 08:19 PM.