Reverse engineering the coordinate system by distance

Say that I have a list of objects and I can tell you how far apart each item is from one another. E.g.:

Object A is 10 units away from Object B
Object A is 15 units away from Object C
Object B is 5 unites away from Object C

I may have a million objects and, again, I can tell you the distance between any two of them.

However, I don’t actually know what space they are in. 2D space? 3D space?

Is there any area of mathematics for looking at the distance data and reverse engineering from it the most likely coordinate space and location of the objects, to most closely match the known distance data?

Notation (x, y) = 10 mean x and y are 10 units apart
If (A, B) = 10 and (A, C) = 10 and (B, C) = 10 you need 2 dimensions. They are points on an equilateral triangle – they can’t be on the same line and 3 non-collinear points define a plane.
If (A, B) = 10 and (B, C) = 10 and (A, C) = 20 they are on the same line
If (A, B) = 3 and (B, C) =4 and (A, C) = 5 they are points on a right triangle
Unless (x,y) = 0 given any 3 points they are either in a line or a triangle (I think – it is early)

Brian

Here is a worked-out example.

There might be easier ways to do it if you know that the space is flat, but this is at the core of relativity.

And nothing about this can tell you a coordinate system: You have to start by defining that, because you can put any coordinate system you want on a space.

Based on the previous answers, it looks like an algorithmic, search and optimize, solution is necessary. But yeah, the goal would be to find the minimum set of dimensions that achieved a less than X% error bar.

It looks like there are some libraries for “distance geometry problems”. That also works for me.

Try this

Nice! (note that one response cites the SDMB) – though that has the additional condition that the points are on a (curved) surface. Given 4 non-coplanar points I think you can find a sphere / hyperboloid that fits them, but that doesn’t always mean that they are on one (e.g. three points on Earth and one on the moon)

Brian

Two solutions have been proposed. I’m not sure #6 will work for OP. UIAM, it assumes the data have (approximately) a uniform distribution in N-space (for some N), but I don’t think that assumption holds in many real-world problems.

#3, IIUC, tries to fit the data to N-dimensional Euclidean space by picking (N+1) points and solving an (N+2)x(N+2) determinant. I assume you’d want (N+1)-sized sets with all big distances. This may be better for OP’s needs, but, unlike #3, will be expensive computationally if N is large.

I think.

I’ve been wondering how I would address OP’s problem, and came up with a possible algorithm. I don’t recall ever working with this sort of problem before, so I want to mention my plan and ask for feedback.

(1) First find 15+1 or so data points widely separated from each other. Call one of them C, “the center” and the others J[sub]1[/sub], J[sub]2[/sub], … J[sub]15[/sub]. These points will define 15 initial axes. Instead of 15, pick any number you think will be at least, say, twice the anticipated number of dimensions. (The larger the better; with today’s fast computers you could use many more than 15.)

(2) Iterate through each of the million data points X, considering the 15 triangles (C, X, J[sub]k[/sub]). Use the law of cosines to compute the length X[sub]k[/sub] of (C,X) projected along (C,J[sub]k[/sub]).

(3) For each pair of those 15 triangles, update the correlation coefficient relating the {i,j} axes for each pair (X[sub]i[/sub], X[sub]j[/sub]).

(4) After all 15 million triangles have been examined, calculate the eigenmatrix of the resulting covariance matrix. If the data is well-modeled with a Euclidean space of D dimensions, expect to see D largish eigenvalues and (15-D) small eigenvalues.

(5) The eigenvectors corresponding to the D largest eigenvalues will be orthogonal axes in the D-dimensional Euclidean space.

Will this work? :confused: Does it even make sense? :eek:

While we’re at it, you might also want to verify that the distances you have are a valid metric: That is to say, none of them are negative, the distance between each point and itself is always 0, and D(a,b) + D(b,c) >= D(a,c) for all triplets (a,b,c).

OP says he has “a million objects” and “can tell you the distance between any two of them.” In my proposed algorithm I only need him to tell me 16 million of the distances (I think :o ), rather than all the pairwise distances of which there are half a trillion. :eek:

How long does it take to calculate the largest eigenvalues of your 15000000 x 15000000 symmetric matrix? Also [perhaps the OP should answer this], are the million points actually supposed to be in a Euclidean hyperplane, or something more complicated?

For the proposed methods to work it would have to be an Euclidean space, or at least affine.

I think the problem would need to be better defined, as it becomes far more complicated and can quickly reach the point of being unsolvable under non-Euclidian models.

Edit to add: If the affine transformation can be considered as equivalent to a linear transformation it is possible that the distances between points are lost, or the relation may only be preserved in straight lines. So I am going back to the problem needs to be further defined as to the assumptions.

I look forward to finding areas to direct study if I am off base with the above assumptions.

It would just be a 15x15 matrix, showing the correlations among the 15 (or whatever) initial axes.

Any space, however curved, can be embedded in a Euclidean space. John Nash said that.
(Let you be in my dream if I can be in yours. Bob Dylan said that.)

And by the same token, any set of distances like this could be embedded in a two-dimensional space, if that space is sufficiently curved.

The buzzwords you want are multidimensional scaling.

In practice, this is a way of trying to simplify/visualise large datasets by compressing them down into smaller dimensions. You start by assigning a notion of distance between the data points, so that “similar” points have small distances between them. Now, provided that all the distances obey a set of consistency relations (like the triangle inequalities), then for N points you can construct an arrangement of them in an N-1 dimensional (Euclidean) space. That’s just classical geometry, but there is a nice concise matrix solution to that problem.
In multidimensional scaling, the question is whether there’s an optimal approximation to this when you project down into a smaller dimensional subspace and what is that. Usually you really want to see whether the 2D approximation, which is the one you can visualise and print as a diagram, usefully captures some interesting feature of the overall structure.

In your case, you’re simply asking whether there’s a subspace where the approximation is actually exact. That’d be unusual in practice, but has been thought about. Indeed the classic pedagogical example of multidimensional scaling is what might be dubbed the “road atlas problem”: given a matrix of distances between cities, can you reconstruct the map? In that instance, the number of cities/points/N can be much greater than the two (or possibly three) dimensions they can be consistently arranged in.

Although, the road distances between cities won’t usually give you a Euclidean two-dimensional map, because roads aren’t all straight.

Which is related to the constraint I was mentioning above, as the OP was only distances, and in even 4D you have 20 degrees of freedom it quickly becomes non-trivial, especially if you are dealing with something a bit more more complex than a Lorentzian manifold, which may or may not be geodesically complete. A torus where the geodesic cannot be represented on the number line to the origin as an example.

Of course this all becomes a non-issue if you add simple constraints to the problem.

Let S(P,Q) be the square of the distance between point number P and point number Q, all of which are known. In N dimensions (where N is the unknown to be found):
S(P,Q) = sum(i=1 to N) ( x(P,i) - x(Q,i) )^2
where x(P,i) is the coordinate value of point P along axis i in some orthogonal coordinate system.

Choose point 1 as the origin of the coordinate system. Since x(1,i) is zero for all i, S(1,P) = sum(i=1 to N) x(P,i)^2

Select a point number M, having a non-zero S(1,M). The line through point 1 and point M will be taken as the first axis in the orthogonal coordinate system. The x(P,1) of all the points can be found from x(M,1)=sqrt(S(1,M)) being non-zero while the x(M,i) for i>1 are all zero:
S(M,P) = ( x(M,1) - x(P,1) )^2 + sum(i = 2 to N) x(P,i)^2
= x(M,1)^2 - 2*x(M,1)x(P,1) + x(P,1)^2 + sum(i = 2 to N) x(P,i)^2
= S(1,M) - 2
sqrt(S(1,M))x(P,1) + S(1,P)
which gives:
x(P,1) = ( S(1,M) + S(1,P) - S(M,P) ) / ( 2
sqrt(S(M,1) )

With all the x(P,1) now found, subtract (x(P,1)-x(Q,1))^2 from each S(P,Q) to give updated values for distance squared in N-1 dimensions:
S’(P,Q) = S(P,Q) - (x(P,1)-x(Q,1))^2

Now this is like the original problem except N-1 dimensions. Keep repeating until all the S’(P,Q) are zero, i.e. unable to choose a point M in the steps above. The number of times this needs to be done in order to zero out the S’(P,Q) is N.