A decade ago, I read a thesis by my cousin where he used a technique called multidimensional scaling or MDS. The concept fascinated me, and since then I’ve seen many places where I’d like to apply it. However I don’t understand the math involved. I’ve looked it up on Google and dug into a few sites, but the ones I’ve found really described MDS as a side issue and they’ve lost me pretty quick.
The questions:
a) Could anyone explain the math behind MDS in enough detail that I could actually perform it programmatically? (Or provide a decent cite).
b) Could anyone point me to an Excel plugin that does MDS?
c) Could anyone point me at source code (any language) that performs MDS? (From that I can build what I need)
In case I’m using the wrong phrase or something, here’s what I’m looking for (and I believe MDS provides):
You take as input a matrix of n points and for each point a distance from it to every other point
As output, you deliver a graph in 2 (or more dimensions) which attempts as closely as possible to be an actual representation of the initial graph. On this graph, points that are close (or related) will appear close together and ones that are less related will be more distant.
Another way to describe it: Say I have a map of the US showing (say) 10 major cities. From that map it’s easy to deliver a list of the distances between each of the cities.
Now, say I have a list of 10 cities and the distances between each of them. Mathematically, how do I come up with the locations of each of the cities?
I’ve done a bit of research. At this point, I understand there’s a concept called “Stress” which is a measure of how close the input and a given output match. I understand you generate a first pass output, measure the stress, then modify your output and measure the new stress over and over to attempt to minimize the stress. But how do you generate the first output and how do you modify it at each round? Source code would be wonderful.
It’s been a while since I did this, and even then, I used a canned program, rather than trying to do it myself.
Guess.
That’s right - it’s non-deterministic. Start with any arrangement of points, for example, a grid. I’d suggest simply generating a random scatter. Then displace each point by a small amount. Again, how much and in which direction is up to you; I’d suggest a small fixed distance and a random direction. Measure the stress to see if it’s smaller than the previous one. If it is, then this becomes your new matrix. If it isn’t, then go back to your old one and try again. Repeat until stress goes to zero (unlikely) or becomes acceptably small.
You’re right; you could get caught in a local minimum without finding the “true” solution. You’ll want to try several runs to see if you get the same answer every time. I was using NTSYS-pc to do this, which was fairly cheap as these things go.
There’s some rather old stuff in this archive. It includes Fortran source code as well as user manuals that explain the algorithm.