I’m doing linear modeling of a point cloud in five dimensional space. There are five continuous variables of similar scale. Two of the points are special in the sense that they represent good examples of something, perhaps, let’s say, Eastern and Western. But there is no East Pole or West Pole, and the points are not on the edge of the point cloud, they’re just (roughly) opposite one another on either side of the cloud center. Thus there is an East - West axis through the cloud that includes and goes beyond these two good example points.
The first thing I care about is, for each of the other points in the cloud, what point on the East - West axis are they closest to? In other words, what is their perpendicular projection onto the axis? I’m already doing this, getting a score defined as -1 for the Eastern example and +1 for the Western example, about 0 at the center of the cloud, and ranging as high as about +/- 3 for points on the cloud edge.
I recognize that if I were doing principal component analysis, my score would be similar to the first principal component, if the first eigenvector happened to go through Eastern and Western. But it doesn’t, and I want to be able to describe my axis arbitrarily on the basis of my example points.
The second thing I care about – and this is the one I don’t know how to calculate – is, what is the distance between each cloud point and its projected location on my axis? Said another way, I want to know, for each point, how badly do I describe the point by pretending it is just a point on the East West spectrum?
If I’m following you, then you’re just talking about vector manipulations and not anything overly special. If all your points can be represented by five-vectors with the origin represented as (0,0,0,0,0), then the thing you’re already doing – finding the projections along the East-West axis – can be accomplished with a dot product of the vector of interest and the unit vector constructed from the East point’s vector (e.g., eastVector / |eastVector| ). In the happy case that your East-West axis is one of your working dimensions to begin with, then the projection amounts to just reading off the appropriate coordinate in any given five-vector’s representation.
For the second part that you aren’t yet doing: it sounds like you are looking for either the distance between the original point and the projected point, or the difference in vector lengths between the original vector and the projected vector. Both of these are vector manipulations as well. The first is just |vectorOrig - vectorProj|, and the second is just |vectorOrig| - |vectorProj|. In either case you may also be interested in dividing the difference by |vectorOrig| to get a sort of fractional error (0=projection is same; 1=projection threw away entirety of original vector).
Throughout, I’ve assumed the five dimensions have commensurate units. That is, distance can be calculated in the usual Pythagorean way. If not, then you would need some metric when calculating vector magnitudes to define how distance in one dimension relates to distance in another dimension. But I’ll save that for follow-up depending on what in the above is or isn’t actually applicable to your situation.
First I can tell you how to calculate it…
I assume that your points are represented by 5-lines, 1 column vectors.
Let u be a unit vector pointing along the East-West axis
(e.g. u = (East-West)/|East-West|).
For each point x, the norm of the projection of x on the east-west axis is (u’.x)
The projection of x on the axis is (u’.x)u
The remainer is x-(u’.x)u
So the distance between x and the East-West axis is |x-(u’.x)u|
Note that these computations assume that your East-West axis passes though zero. If not, you may want to move the zero somewhere on this axis ;).
Now, for the name of what you try to compute… in statistics, you would say it is the Euclidean distance between your points and the East-West axis. You could also compute the angle between the vectors x and u.
At the population level, you could compute the sum of square norms of all vectors x (call it SStot) and the sum of square residuals (call it SSres)… then you could compute (1-SSres/SStot) and call it the percentage of variance on the East-West axis.